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This paper proposes a welcome hypothesis: a computationally simple device is sufficient for processing 
natural language. Traditionally it has been argued that processing natural language syntax requires very 
powerful machinery. Many engincers have come to this rather grin conclusion: almost all working parsers 
are actually Turing Machines (TM). For example, Woods specifically designed his Augmented ‘Transition 
Networks (A‘ENs) to be ‘Turing Equivalent. If the problem is really as hard as it appears, then the only 
solution is to grin and bear it. Our own position is that parsing acceptable sentences is simpler because there 
are constraints on human performance that drastically reduce the computational requirements (time and 
space bounds). Although ideal linguistic competence is very complex, this observation may not apply directly 
to a real processing problem such as parsing. By including performance factors. it is possible to simplify the 
computation. We will propose two performance limitations. bounded memory and deterministic control, which 


have been incorporated in a new parser YAP. 
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1. Introduction 


‘This paper proposes a welcome hypothesis: a computationally simple device! is sufficient for processing 
natural language. ‘Traditionally it has beew argued. that processing natural language syntax requires very 
powerful machinery. Many engineers have come to this rather grim:conclusion; almost all working parsers 
are actually ‘Turing Machines (1'M)2. For'example, Woodsispccifically designed his, Augmented ‘Transition 
Networks (ATNs) to be luring Equivalent. 


(1) “It is well known (cf. [Chomsk y64)]) that the strict context-free grammar model is not an adequate 
mechanism for characterizing the subtletics ef natural. languages ..., When conditions and actions , 
are added ‘to the arcs, tie model attains the power of a ‘Turing, machine, although the basic 
operations which it perforis are ‘natural’ oncs.for language. analysis. Using these conditions and 
actions, the model. is ‘capable of performing the cquivaleat of transformational analysis without 
the need for a separate transformational compoacat.” {Woods70) . 


If the problem is really as hard as it appears, then the only, solution is to grin. and bear it. Our own position is 
that parsing acceptable sentences is simpler because there are constraints on human performance that exclude 
all the "harder" cases. A real parser catt take advantage’ ‘of these performance. constraints (c.g. limited 
memory) so that it can be simpler and more efficient than Woeds’ ideal’ model which is designed to parse the 


entire competence grammar. 


1. Throughout this work, the gomplexity notion will be used in its computational sense as a measure of time and space 
resources required by an optimal processor. The term. will not be used in the linguistic sense (ic. the size of the grammar 
itself), da. general. one can trade one off for,the other, which leads to considerable confusion. The size of a program 
(linguistic complexity) is typically inversely related to the peeeeut the ititerpretet (computational complexity). This point 
is discussed more thoroughly in chaptef 6. 

2. [tis important lo distinguish computational ened (time sind space bounds) from computational class (finite state 
FS. context free CF, context sensitive CS. turing machine TM). A grammar that describes a kirge class is generally more 
difficult to process than a more tightly constrained granmiar. “For exanipte. FS! grammars can be parsed. with bounded 
Space: all others consume unbounded space. Similar comments probably bold for time complexity, too (hough the proof 
is an open problem.) That is. FS grammars can be parsed in linear tine, whereas CF grammars probably require more 
lime in the worst case. 

3. In fairness to the ATN and Transformational ©... nar, it should be noted that there have been efforts to reduce the 
generative capacity. For example, Kaplan (personal communication), [Wogds73] and. [Peters and Ritchie73] discuss: 
various restrictions to ussure decrlability. Unfortunately, this move is nat suffigient lo guaranties efficient (¢.g. polynomial 
time) processing: parsing decidable grammars may be effective, but it is hardly efficient. 
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1.1 The Competence/Performance Dichotomy 


The approach crucially depends on performance constraints to shrink the scarch space of possible derivations. 
Formerly engineers such as Woods attempted to model campetence without. performance constraints, and not 
surprisingly, they found they acceded inordinate resources jo.do so. We suggest that a real processor 
incorporates both competence (grammar) and performance (time and space) constraints, Hence it is possible 
to build a small efficient processor by exploiting the performance model... This is particularly clear from 


Chomsky’s original description of the performance/competence dichotomy. 


(2) “Linguistic theory is concerned primarily with an ideal. speaker-listencr, in a completely 
homogeneous speech-community, who knows its language perfectly and is unaffected by such 
grammatically irrelevant conditions as memory Himitatiuns, distractions, shifts,of attention and 
interest, and errors (random or characteristic) in applying his knowledge of the language in 
actually performance .... We thus make a fundamental distinction between competence (the 
speaker-hearer’s knowledge of his language) and performance (the actual use of language in 
concrete situations)." [Chomsk y65, pp 3-4, italics added} = 


The proposed model is more efficient and mare restrictive than Woods’ A'TN, It is more efficient because it 
docsn’t have the resources to waste and it is more restrictive because it doesn't have the resources to explore as 
miuany possibilities. For example, there are some sentences which will require a very long time on an ATN; 
our model will reject these sentences as unacceptable (not in the performance model) because it doesn’t have 
the time to figure it out. We belicve there are two reasons for rejecting sentences; a sentence may be 
ungrammatical (excluded from competence) or it may be unacceptable (excluded from performance).4 ‘The 


term acceptable was coined by Chomsky to refer to: 


(3) "... utterances that are perfectly natural and immediately comprehensible without paper-and- 
pencil analysis, and in no way bizarre or outlandish ...'Ihe more acceptable sentences are those 
that are more likcly to be produced, more casily understood, less.clumsy, and in some scnse more 
natural. ‘The unacceptable sentences one would tend to aveid and replace by more acceptable 
variants, wherever possible in actual discourse.” [Chomsky65, pp. 10]. — 


4. This position should be distinguished from Kaplan's hypothesis (personal conmmunication) that the processing 
grammar is identical to the competence grammar. We suggest that there are some extra-grammiatical factors (c.g. memory 
limitations) which distinguish the two. 


The Competence/ Performance Dichotomy (s4l- Section 1.1 


Acceptability is assigned independently from grammatical the four logical possibilitics are illustrated by 
(45 


. (4) Itis raining. 
# Tom figured that. that Susan. wanted to take th the cat out bothered Betsy out. 
*They am running. 
*#'Tom and slept the dog. 


gap i 


Chomsky formulated this distinction in order’ to separate érclevant processing constraints (c.g. limited time 
and spacc) from the grammaticality questions which he has been studying. Our hypothesis that a.simple 
device can process language, is then, by definition, a hypothesis about the performance model. Acceptability 


judgments will bear crucially on the matter. 


The problem is to design a parser that approximates competence with ‘realistic resources. Unacceptable 
sentences should be excluded because they require inordinate resources to process; ungrammiutical sentences 
should be rejected because they violate competence ideatizatlot itis (oF ifproximations thereof). ‘The design 
criteria arc summarized below: 


(5) What are some reasonable performance approximations? 
(6) How can they be implemented without sacrificing linguistic gencralizations? 


1.2 The FS Hypothesis 


We will assume a severe processing limitation on available short term memory (STM), as commonly 
suggested in the psycholinguistic literature ({F razicr79}, [Frazier and Fodor78}, [Cowper76], [Bresnan78], 
{Kimball73, 75}, [Chonisky61)).. ‘Vochnically a machine with limited 7 memory is a ‘finite State machine (FSM) 
which has very nice compu tational propertics when compared: to an urbiteary YM: Most importantly, a FSM 
requircs less time and space in the worst casc. ‘There are sane other advantages | which we have not explored 


5. These examples are taken from [Kinball73), A bash mark (#) is used to indicate unaceeplabilily: an asterisk (*) is 
used in the traditional fashion to denote ungrammiaticalily. 

6. Just as Chomsky idealized granamaticality from. ther unexplained irrclevaal factors, it will be useful to idealize 
acceptability. In thisework, we arcmoustinteresicd in kine and. space behavior i in the limit as sentences grow: we will not 
address borderline cases where jadgments tend lo: be extromely,.varebie. This move is oficn taken in conplexily 
arguments which study limiting growth, but ignore constants (burdertine cases). 


a ca 2 ame Ege. | 
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in detail. For example, it is casier to nin a FSM in reverse. ‘lhis may have some important implications if one 


were attempting to build a single model for both production and gencration as suggested in [Kay75].’ 


When discussing certain performance issucs (c.g. center-embedding),® it will be most uscful to view the 
processor as a ISM; on the other hand, competence phenomena (e.gi'subjacency)? suggest a morc abstract 
point of view. Because of a lack of TM resources, the processor cannot litcrally apply rules of competence; 
rather, it resorts to more computationally realistic approximations. Whenever a coinpetence idealization calls 
for inordinate resources, there will be a discrepancy: between the a ela idealization and its performance 


realization. 
1.2.1 Center-embedding 


Chomsky and Bar-Hillel independently showed that (arbitrarily deep) center-embedded structures require 
unbounded memory [Chomsky59a, b] [Bar-Hillet61) {I. angendocn75}. As predicted, center-embedding is 


severcly compromised in performance; it quickly becomes unacceptable, even at relatively shallow depths. 


(7) #[The man [who the boy [who the students recognized] pointed out] is a fricnd-of mine] 


(8) #[The rat [the cat [the dog chased] bit]‘ate the cheese.} 


7. Trivially all physical machines are FSMs, The FS hypothesis i is interesting. “though. because the memory limitation is 
sy severe (ie. two of three clases) that itis a crucial issue’ in ‘Puiny ‘practical: situations.: Similar: comments can be made 
about modern computers, “Most cnginecrs would model a iypigal large computer system as a TM. However, it would be 
_hard to think ofa computer asa TM if it had only t bit of memory. How much memory does it take before a FSNfis best 
modeled as a TM? The answer may depend-on the current price o&memory.2 What once seemed unreasonuble, may not 
be so unrealistic today. 
8. A centereembedded sentence contains an embedded clause surrounded by lexical material from the higher clause: 
[, x|,- . Jy). where both x and y contain lexical material. 
9. Subjacency isa formal linguistic notion which constrtins the fpplicability:uf'a transformation. (Informally, subjacency 
is a locality principle: all transformations must be local to a single. cyclic pode (age clause) or to.two adjacent cyclic nodes.) 
We offer subjacency as an example of a competence idealization. In generat tdsaugh: it is extremely difficult to prove that 
a particular phenomenon is necessarily a matter of contpétetice: We have no proof that -subjacency is a competence 
universal, and similarly. we have no proof that center-embedding 4 a ev universal, Our assessments are most 
plausible. though conceivably, they might be incorrect. : 
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A memory limitation provides a very attractive account of center-embcedding phenomena (in the limit).!° 


(9) "This fact [that deeply center-embedded sentences are unacceptable], and this atone, follows from 
the assumption of finiteness of memory (which no one, surely, has cver questioned)." 
{Chomsky61, pp. 127] : . 


1.2.2 Respectively 


What other phenomena follow from a memory limitation? Center-embedding is the most striking example, 
hae Oe a i 

but it is nef unique. ‘There have been many refutations of FS competence models; cach one illustrates the 

point: computationally complex structures are unacceptable. Consider the respectively construction" which is 


’ 


notorious for its crossing dependencies. !2 As predicted, it too becomes rapidly unacceptable. 


(10) John and Jack. knew ‘Tim and Mike, respectively. 
Mohn, Jack and Sam knew ‘tim, Mike and Rob, respectively. 
"John, Jack, Sam, and Tom knew ‘Tim, Mike, Rob and Bill, respectively. 
MNohn, Jack, ..., Sain, and ‘Tom knew ‘Tim, Mike, .... Rob and Bill, respectively. 


1.2.3 Lasnik's Noncoreference Rule 


Lasnik’s noncoreference rule [lasnik76] is another source of evidence. ‘The rule observes that two noun 


phrases in a particular structural configuration are noncore ferential. 


10. A complexity argument of this sort does not distinguish between a depth of three or a depth of four. It would require 
considerable psychological experimentation to discover the preeise dimitations. This account predicts that all 
center-embedded structures eventually become unacceptable although it is possible that certain constructions become 
unacceptable more rapidly than others, For example. [Cowper76} has found some differences between relative clauses 
and complement clauses. 

HL. [Bar-Hillcl6t] argued that respectively proves the competence model is not CF. It has been widely suggested that 
respectively is really a semantic issuc which shouldnt concern syntax. Although this point is well taken. there are 
MumMerous analogous constructions (Dutch verbs [Huybregis76]. Swedish wh-movement. and Mohawk [Postal64]) which 
pose the same problem. [fall of these arguments are mistiken and granimar is in fact only CF. then it is even easicr to 
defend the FS Hypothesis. (Only center-embedding would have to be excluded.) 

12. Crossing dependencies are beyond CF complexity. The proof uses the pumping lenima. [Huybregts76] 

13. fe can be argued that this rule is nota syntactic rule and hence itis irrelevant to the FS hypothesis. Actually. we 
believe that the FS hypothesis is more general it applies at aff levels of linguistic processing, not just the syntactic 
component. 
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(11) ‘The Noncoreference Rule: Given two noun phrases NP), NP, in a sentence, if NP) precedes and 


commands! NP) and NP, is not a pronoun, then NP; and NP» are noncoreferential. 


For example, each John in (12) must refer to different people, since the first John both precedes and 
commands the second, ‘This rule has unbounded consequences; it applies even when there are an arbitrary 
number of clauses between NP} and NP. Consequently, unbounded memory is required to process the rule; 
it becomes harder and harder to enforce as more and more names are mentioned. His rule is part of a 
competence model; in performance, it seems necessary to approximate the rule. As the memory requirements 
grow, the performance model is less and less likely to establish the noncorcferential link. In (12), the 
co-indexed noun phrases cannot be coreferential. As the depth increases, the noncoreferential judgments 


become less and less sharp, even though (12)-(14) are all equally ungrammatical.!5 


(12) *# Did you hear that John; told the teacher that John, threw the first punch. 
(13) *??Did you hear that John; told the teacher that Bill said that John; threw the first punch. 
(14) * Did you hear that John; told the teacher that Bill said that Sam thought John; threw the first 


punch, 


Ideal rules of competence do not (and should not) specify real processing limitations (c.g. limited inemory); 
these are Inatters Of performance. (12)-(14) do not refute Lasnik’s rule in any way; they merely point out Mat 


its performance realization has some important empirical differences from Lasnik’s idealization. 
1.2.4 Convergence 


On the other hand. there are idealizations which can be realized in performance without approximations. For 
osample. itseems that movement phenomena can cross unbounded distances without degrading acceptability. 


Compare this with the center-embedding and respectively examples previously discussed. !® 


I Trtormally. a phrase precedes phrases to its right. For example. x precedes vint wx... yu. A phrases commands 


Phioess th subordinate clauses. That is, x commands each yon [Xd Vp tg edg See [Lasik 76] for more 
SCUSSION, 

PS) Some re formants report that they noticed noncoreference, but chose to ignore it in the mure complex cases. This 
Scots co con F TCE WIT our account that itis too difficult to establish the noncoreference Tinks. 

(6 We opioid used Uae same verbs to ihustrate the recursive nature of these constructions. They would be more 


sires aceptatde ave tised different verbs. 
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(15) There seems to seein ... to be a problem. tas spt. Rin -move-np 
(16) What did Bob say that Bill said that... John liked? ; move-wh 


We claim that center-embedding and respectively demand tinbounded résoutees whercas movement has a 
bounded cost (even in the worst casc).!7 We will argue in chaptets'5, 6 nnd 8that a machine can process 
unbounded movement with very limited resources. Movement phenomena (unlike center-embedding) can be 
implemented in a performance model without approximation. Iti isa welcome result when performance and 
competence happen to converge, as in the movement case; there will be no empirical differences between the 
idealization and its realization. However, there is no logical necessity that performance and competence will 
ultimately converge in every area. The FS hypothesis, if correct, would necessitate compromising many 


competence idcalizations.!8 


3 ‘The Proposed Model: YAP 


Some psycholinguists believe there is a natural mapping from idcal competence onto a realistic processing 
model. ‘This hypothesis is intuitively attractive, even though there is no logical reason that it need be the 
case.!9 Unfortunately, the psycholinguistic literature docs, not preciscly describe a mapping which is 
consistent with our FS hypothesis.” We have ‘implemented a parser (YAP) which behaves like a complex 
competence model on acceptable cascs, but fails to parse more difficut unacceptable sentences. ‘This 


sbftts 
7 


performance model looks very similar to the more ‘complex competence machine on acceptable sentences 


17. The human processor may not be optimal. . The functional argument observes that an optimal processor could 
process unbounded movement with bounded resources. This should encourage further ee: but it alone is not 
sufficient evidence that the human processor has-optimal properties so 

We claim movement will neyer consume more than a bounded COS. the cost, is. independent of. the length, of the 
sentence. Some movement sentences may be easier than others. For exampte, there is considerable experimental 
evidence suggesting that subject relatives (a) are easier than object relatives (b). 


(a) | saw the boy who liked you. 
(b) | saw the boy who you liked. 


However. we believe the difference between (a) and (b) is independent of their lengths. 

18. We have given only three examples: center-cnibedding, Crossing dependenci¢s and noncoreference although there 
are many more. Center-cmbedding and crossing dependencigs. were, intgndad (0-be iMustrative of structural limitations; 
noncoreference is typical of interpretive rules (such as pronominal binding). 

19. Chomsky and Lasnik (personal communication) have each suggested ual she-competerce model might generale a 
non-compulable set. IF this were indeed the.case. i woutc. -cem unlikely, that. there.could be a mapping. 

20. Chart parsers (such as GSP (Kaplin73)) do not satisfy our requirements for a psychologically. realistic mapping since . 
they. are inconsistent with our FS hypothesis, [Cis nut desp haw chart parsers can uecount for ihe evidence in favor of the 
FS hypothesis. 
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even though it “happens” to run in severely limited memory. Since it is a minimal augmentation of existing 
psychological and linguistic work, it preserves their accomplishments, and in addition, achieves computational 
advantages. Chapter 2 will discuss the particular augmentation which allows YAP to conserve memory, and 


hence reduce complexity to that of a FS machine. 


The basic design of YAP is similar to Marcus’ Parsifal [Marcus79], with an additional limitation on memory. 
His parser, like most stack machine parsers, will occasionally fill the stack with structures it no longer needs, 
consuming unbounded memory. ‘To achieve the finite memory limitation, it must be guaranteed that this 
never happens on acceptable structures. ‘That is, there must be a "forgetting" procedure (like a garbage 
collector)?! for cleaning out the stack so that acceptable sentences can be parsed without causing a stack 
overflow, Everything on the stack should be there for a reason: in Marcus’ machine it is possible to have 
something on the stack which cannot be referenced again. Equipped with its forgetter, YAP runs on a 


bounded stack even though it is approximating a much more complicated machine (c.g. a PDA).22 
1.4 Closure Strategies 


The forgetting (closure) notion is crucial to this thesis; it enables YAP to parse unbounded structures with 


3 There arc two closure procedures mentioned in the psycholinguistic literature: 


only finite memory. 
Kimball's early closure [Kimball73, 75] and Frazier’s late closure [Frazier79] [Frazier and Fodor78]. We will 
argue that Kimball’s procedure is too ruthless, closing phrases too soon, whereas Frazicr’s procedure is too 
conservative, wasting memory. Admittedly it is easier to criticize than to offer constructive solutions, 
Chapter 2 will develop some tests for evaluating solutions, and then propose a compromise which should 
perform better than cither of the two extremes, early closure and late closure, but it will hardly be the final 
word. The closure puzzle is extremely difficult, but also crucial to understanding the seemingly idiosyncratic 


parsing behavior that people exhibit. 


21. The “garbage collection” analogy is not completely accurate. Garbage collectors return storage to the system when it 
is Anowa that cannot be referenced again: closure procedures return storage when is it suspected that it will not be 
referenced again, 

22, A push down automaton (PDA) is a formatization of unbounded stack machines. 

23. Bounded memory was the original motivation for closure. Some closure formulations are heuristic: they close a 
phrase betare iis Avowe that the phrase in question cannot be referenced again. Theoretically, though. closure need not 
be heuristic: it is possible for a FSM to parse non-center-embedded CF structures without heuristics. We have opted fora 
heuristic formulation which appears to more practical (as we will argue in the next section). 
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1.5 Marcus’ Determinism Hypothesis 


‘The memory constraint becomes particularly interesting when it is combined with acontrol constraint such as 

Marcus’ Determinism Hypothesis {Marcus79], ‘The Determinism H ypothesis: claims that once the processor is 
committed to a particular path, it is extremely difficult to select an altcrnative. F OF example, most readers will 
misinterpret the underlined portions of (17)-(19) and then. have considerable difficulty continuing. For this 
reason, these unacceptable sentences arc often called Garden Paths (GP). A ingmory limitation alone fails to 
predict the unacceptability of (17)-€19) because GPs don’t cepter-embed very deeply (and hence there exits a 
FSM which could parse these GP sentences). Deterninism offers an, additional constraint on memory 


allocation which provides an accawnt for the data. 


(17) # The horse raced past the barn fell. 
(18) # Joho lifted a hundred pound bags. 
(19) #1 wold the boy the dog bit Sue would help him. 


There have been many other attempts to capture the same: intuitive ‘notion. - Kimball's Processing Principle 
{Kimball73], McDonald's Indeclibity Stipulation [Mclonald79], and = Frazicr’s “shunting” notion 
[Frazicr and Fodor78] are typical cxamples from the psycholinguistic literature. ‘The “shunting” notion 
assigns a high cost to backing up past a phrase that has been “shunted” from’'one stigt to another. 


(20) Indelibility: “Once a linguistic decision has been made, it cannot be retracted -- it has been 
written with ‘indclible ink’ ... It requires every choice made during the production process, at 
whatever Ievel, cannot be changed once it has been thade ee “choites init "be thade ‘correctly the 
first time.” [McDonald79, pp. 16] 


(21) “Principte Seven (Processing): When a phrase is closed, it is pushed down into a syntactic 
(possible semantic) processing stage ‘and cleared from shirt “term ala . iK linball73 pp. 39} © 


a 


4%. 
res 


24. There are other possible accounts which may be Very similar ty Mareus’ account: For example, GPs are oflen related 
to backup in non-deterministic frameworks.” However. 1s not cleur hew:such an account can distinguish backup on a GP 
from backup on an acceptable sentence. Ofte solution places a bound On ‘backup to-enuble the parser to backup on the 
acceptable sentences but not on GPs. th:some sense: this is very similarite Mitrcus! approach: he provides a bound on 
lovkahead (analogous to bounded backup) which distinguishes GPs from acceptible sentences. 
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Although the “determinism” notion is widely discussed in the literature, it is extremely difficult to describe 
precisely. At first we believed the memory constraint alone would subsume Marcus’ hypothesis, thus 
providing a precise independently motivated account. Since all FSMs have a deterministic realization,2> it 
was originally supposed that the memory limitation guaranteed that the parser is deterministic (or equivalent 
to one that ts), Although the argument is theorctically sound, it is mistaken.2° ‘The deterministic realization 
may have many more states than the corresponding non-deterministic FSM. ‘These extra states are extremely 
costly and lack empirical justification. They would enable the machine to parse GPs by delaying the critical 
decision” In spirit, Marcus’ Determinism Hypothesis excludes encoding non-determinism by exploding the 
State space in this way: it assuincs that most exploded states are not reachable in performance. ‘This amounts 
to an exponential reduction in the size of the state space, which is an interesting claim, not subsumed by FS 


(which only requires the state space to be finite).28 


The forgetting procedure, which is the subject of chapter 2, will be “deterministic”; it will not backup or undo 
previous decisions. Consequently, the machine will not only reject deeply center-embedded sentences but it 
will also reject sentences such as (22) where the heuristic forgetting procedure makes a mistake (takes a garden 


path). 


(22) #Harold heard [that John told the teacher [that Bill said that Sam thought that Mike threw the 
first punch] yesterday]. 


Marcus’ Determinism Hypothesis predicts that some sentences would be garden paths (since the state space 
cannot be exploded), but it alone docs not identify which sentences are GPs and which ones are not. He 
proposes a specific parsing model (Parsifal) to identify garden paths. Parsifal makes a single Ieft to right pass 
over the sentence. It has to decide what to do at cach point based upon a limited lookahead of three 
constituents. According to Marcus, certain sentences require more lookahead to disambiguate 
aly rithmically, and consequently, Parsifal has to guess what to do. In the garden path case, Parsifal guesses 


incorrectly. 


2§. A non-deterministic FSM with 2 States is equivalent to another deterministic FSM with 2” states. 

Po, pam imdebted to Ken Wexler for pointing this out. 

OF, Phe oxplodcd states eneade disjunctive alternatives (as observed in [Swartout78]). Intuitively, GPs suggest that it 
spt possthte to delay the ertiical decision: the machine has to decide which way to proceed. 

oS. Marcus’ bypathesis ts necessarily vague because there is no clear way to distinguish an exploded state from a 
primitive sles. without reference lo a particular machine (grammar). The definition becomes more precise when state 
aasigpicits are ids poudently motivated (by linguistic generalizations). 
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The three constituent limit is a very good description; all the garden path sentences shown above would 
require a four constituent lookahead to disambiguate correctly, (23) illustrates Marcus’ account on a typical 
GP. It would be acceptable if the machine looked ahead another constituent.” 


(23) #The horse [; raced] [y past] [; the barn] [, fell]. 


‘The three constituent story is not a complete explanation. Why docs Parsifal guess that raced is a main verb 
and not a participle? ‘The main verb interpretation is apparently the unmarked (preferred) case. Would it be 


possible to have a language where the participle reading was the unmarked case? 
1.6 Frazier's Principles 


Frazier [Frazier79] [Frazier and Fedor78} has attempted to describe the unmarked interpretations. She has 
proposed two. principles which arc presumably universal. ‘here is an‘intuitive flinctional motivation for these 
principles; they appear to require fewer resources (memory-and backup) than the atternatives. Frazier has 
provided considerable experimental evidence as ompiricat verification. 


(24) Minimal Attachment: Attach incoming matcrial into the phrase marker being constructed using 
the fewest nedes.consistcnt with the well-formedness sulcs of the language. ~ - 


(25) Late Closure: When possible, attach incoming material into the clause or phrase currently being 
parsed. 


ae 


29. In practice, the lookahead -will consist of neun phrases and:single words: thy machine does not, for example, build 
prepositional phrases in the lookahead buffer. Unfortunately this is crucial to Marcus’ account of the GP phenomena; 
Parsifal does sof analyze sentence (23) as. The horse [; raced} ty past the al F fel s fi it ai then it would be ablc to 
disumbiguate te senience. 

There are some other problems with this account. For example, waslerl afler tbe third constituent shoulda’t affect. 
the judgments, and yet. the sentences below seem to be more acceptable than (23). 


The horse raced past [; the burn] fell down. 
The horse raced past [; the barn] stumbled, 


We have no explanation for this data, Nevertheless, Marcus’ account is the best description in the litcrature; we will 
wecept it for the time being despite its problems. 
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Frazicr’s position is basically compatible with Marcus’; Ker principies define the unmarked actions when 
there is insufficient lookahead to be certain. Late Closure, which is relevant to the discussion on forgetting, is 
central to chapter 2; Minimal Attachment is the topic of chaptér 4. “Mhere are some (rare) cases where the 
principles fail to find a correct parse on the first pass, forcing backup in her non- deterministic framework. 
‘These will be interpreted as marked "counter-examples” in out deterministic FS framework. We will add a 


few marked rules to cover the exceptions. 
1.7 Capturing Generalizations 


Having laid out the basic framework (limited memory and determinism), it is worthwhile to gain some 
breadth. YAP has encoded a competence model strongly resembling the recent work of Bresnan and Kaplan 
{Bresnan78}, [Bresnan80], [Kaplan and Bresnan&80}. ‘They.use-two represcatations: a cunstitucnt structure and 
a functional structure. The former deals with mother/daughter relatianships whereas the latter is concerned 
with grammatical rales (subject, object, etc.) and syntactic features (casa, tesse, person, number, gender, etc.) 
Chapter 3 discusses the YAP implementation of genstituent structure, and Chapter 5, the functional structure. 


With the Bresnan-Kaplan representation system, it. is relatively. straightforward to implement many of their 
analyses. Chapter 6 presents some: typical iexical rules (raising and passive), thus capturing many of the 


gencralizations which were once believed to be beyond the capabilities ofa FSM. 


1 Tidy 


YAP also shares many propertics with Parsifal: it is possible to implement Parsifal-style transformations in 
YAP. Chapter 7 implements auxiliary inversion and imperative using Parsifal’s approach. ‘This demonstrates 


an alternative method to capture the generalizations that were used to "refute" the FS hypothesis. 


There are two classes of transformations which have been traditionally problematic for processing: 
wh-movement and conjunction. . Chapters 8 and 9-present:the approach taken ‘in YAP. Conjunction is 
particularly interesting because it has never been implemented i ina Marcus style deterministic. parser. Both of 
these constructions pose many difficult problems; only some of these have been asia However, YAP has 


produced some exciting initial results, correctly parsing the following sentences: 


30. She is crucially assuming a non-deterministic framework where the processor can backup past certain errors, In our 
framework, we need some exceptional rules to prevent the processor from taking the wrong path in the first place. 
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(26) Which boys and girls went? 

(27) Which boys and which girls went? 

(28) Which boys went to the ball and took the jar? 
(29) Which boys went to the ball and to the jar? 
(30) What boy did Bill look at and give a ball to? 
(31) Bob gave Bill a ball and John a jar. 

(32) Bob saw Bill and Sue Mary. 

(33) EF want Bill, Bob, and John to be nice. 


1.8 Limits of This Research 


It has not been possible to study all issucs relevant to parsing; we have touched on just a few of the many 


interesting problems. ‘This section will mention some areas for further study. 


(34) Coverage _ 

(35) Semantic Interaction 
(36) I.ength Bias (word count) 
(37) Lexical Ambiguity 


1.8.1 Linguistic Coverage 


There are many constructions which will not be discussed; YAP is similar to Marcus’ Parsifal in coverage. 
Both parsers handle a range of fairly difficult phenomena, are intended to handle robustly al interactions 


among these phenomena, though neither parser has extensive coverage. YAP docs not parse (38); 39), for 


example. 
(38) Tam taller than Bill. comparative 


(39) ‘The duck is too old to eat. tough movement 


We have nothing to say about the internal structure of noun phrascs’such a6 (40). It would have been - 


relatively straightforward to replicate Marcus’ approach. 
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(40) anice man 
a fallen leaf 
*a given child 


a hundred pound bag 
#a hundred pound bags 
1.8.2 Semantics 


YAP docs very little semantic processing. For example, YAP does not distinguish between. animate and 


inanimate objects; (41) and (42) are equally parsable from YAP’s point of view. 


(41) I gave Billa ball. 
(42) [I gave a ball Bill. 


It is somewhat difficult to distinguish semantics and syntax. YAP docs check graminatical relations (subject, 
object, ctc.). (43) and (44) are correctly distinguished because go and see take different arguments. 


(43) I saw Bill. 
(44) *f went Bill. 


We have not considered bound anaphora and quantifier scope as illustrated below. 


(45) Bill saw himself. | a bound anaphora 
__ *Himself was scen. . , 
*Each other were scen. 
(46) Bill saw everyone. quantifier scope 


Everyone was'scen by Bill. 


1.8.3. Length Bias and Lexical Anibiguity 


There are at Icast two other processing variables that seem to be relevant: Iength and Icxical ambiguity. Both 
of these are extremely difficult problems which have been widely studied clsewhere [Frazier and Fodor78]} 


[Milnc78a,78b,79,80}. (47) provide some evidence that length (ic. number of words) influences parsing 
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strategies:>! (48) illustrates some problems with lexical ambiguity. 


(47) #'The woman the man the girl loved met died. ae length 
7Vhe very beautiful young woman the man the sic loved met on. a cryisc ship in Maine diced of 
cholera in 1962. 


Joe brought the book for Susan, 
Joe brought the book that I had been trying to obtain for Susan. 


(48) ‘They were flying planes. lexical ambiguity 
‘The pupils were small. 


I love building blocks. . 
Whatever they are building blocks the view. 


All of these issues are extremely important topics for further research. 


31. This evidence is from [Frazier and Fodor78). Much of it is highly controversial; there may: be alternative accounts. 
Nevertheless, even if we can’t provide adequate evidence, it is most plausible that Icngth influences parsing sLrategics. 


ERA, 
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2. Closure 


YAP is essentially a stack machine parser like Marcus’ Parsifal with an additional bound on stack depth. This 
chapter will deal with the stack allocation problem. Theré' will be a forgetting procedure to remove finished 
phrases from the stack so the space can be recycled. ‘The procedure will have to decide (heuristically) when a 


phrase is finished (closed). 
2.1 On Left/Right Biases 


If we are going to count stack depth, we should be very carcful that stack depth corresponds to something 
meaningful. We will assume stack space ought to be correlated with the depth of center-embedding. 
Empirically, both left and right branching are relatively free in comparison with center-embedding as 
(49)-(51) illustrate.22 ——— _ 


32. This position is somewhat different from the “hold hypothesis" [Kaplin75] which accounts for center-embedded 
relative clauses but no other types of centerrembedding. We believe that a// forms of center-embedding become rapidly 
unacceptable even at shallow depths. For exampic, the following sentences from [Rich75] are unacceptable even though 
there are no center-enrbedded relative clauses. We accept Rich's argument that the “hold hypothesis” fails to account for 
all of the center-embedding facts. 


(a) #1 think [claiming [voting Republican] is immoral] is silly. 
(b) #1 think claiming [the dog {that bit the burglar] is scared] is silly. 


Notice that both eft and right branching have many “bunched up" brackets. Langendoen (personal conimunication) 
has observed that "bunched up" brackets are redundant, and hence they can be deleted without loss of information, Ina 
sense, the FSM manipulates the resulting representation. 

Alternatively, one might view the brackets as corresponding to stack instructions. An open bracket ([) is analogous to 
"push" and close (J) is analogous to “pop”. Deleting brackets corresponds to optimizing stack operations (e.g. tail 
recursion [Stecle78]). Just as "bunched up" brackets suggest a redundancy, a sequence of “pup” instructions in the logical 
flow of a program indicates wasted stack space. 

One can view closure as deleting brackets. like tail recursion, to optimize stack usage. [In left and right branching, it is 
possible to delete the “bunched up" brackets, and hence, bound the maximum stack depth. This fails to bound memory 
requirements in center-embedded cases where there are no “bunched up"~brackets”to detete. Chomsky’s proof 
[Chomsky59a.b] is a formatization of this intuition; center-embedding cannot be optimized because it requires. 
unboutided memory (there is tO way lo-convert a strictly CF ‘graniinat into a FS granimar). On the other hand, it is — 
possible to optimize non-center-embedded structures beGause they ure FS equivalent. es 
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(49) [{[fl he man]’s oldest brothcr]'s best friend]'s'sister}... “© left 
(50) #{I'he man [who the boy [who the students recogitized] pointed out] is'a friend of mine] center 
(51) [Ihe students recognized the boy [who pointed Out the“iidh {Whe BW adftiend of nine.) right 


Although we consider left and right branching structures to be equally casy to parse, there have been 
psycholinguistic models with a left/right bias. For example, Yngve [Yngvc60] suggested that Icft branching 
was more difficult than right branching because left, branching, is, extremely, difficult for .a left-to-right 
fop- down 1 parscr. 33 However, the dual argument could, have. been wscd 0, argue, pans HE branching 
right branching requires ; unbounded — because Chomsky act ms sop canes nbSANeN- cr 
structures (¢.g. left and right branching and combinations thereof) could be processed with a finite: state 
machine [Chomsky59a,b]. On the other hand, center-embedding is os difficult because it Peau’ 
unbounded memory; it cannot be processed by. a FSM. oe fo ten oe at 


It is possible that human processing is not optimal in this way}: there might in. fact be a left/right bias even 
though there is no computational motivation. ‘The research strategy taken here will investigate the optinal 
methods fi rst. Although computationally optimal procedures are not necessarily the ones people do i in _ 


“EEE if ae 2 


use, they are likely candidates for further research, 


One might arguc as Yngvehas, that English has a‘deft/tight: bias éven though no one has fouind a 
computational motivation. Hi:fatt; it is-very. difficult to: find acceptable ‘left’ Branching clauscs in English. 
There doesn’t seem to be an acceptable Icft branching ‘pattiptrrasé of (53) th Rrigitsh: as (53) and (54) itlusirate. 
Yngve's left/right processing bias is certainly not universal to all languages because there are languages 
(c.g. Japanese) where Ieft branching is just as productive as right branching is in English. “Hence the, ili 


asymmetry in Engiett is language spccific; it docs not indicate a bias in the human Processing system.” 


33. For example, left: branching is infinitely difficult ¢impussibte) for an ‘L1(K) parser. It also Gaused the Harvard 
Syntactic Analyzer (HPA) [Kuno66] considerable problems. 

34, There have been arguments fora feft/fight asyaimictry based Gn ihe assamption that ‘human processing is going 
(Icfl-to-right), Chomsky’s 1959 proof shows that these arguments are invalid, 

35. We know of nv languages which have bothleft and right bainiching Clatises. This generalization i is unexplained if it is 
indeed universal. 
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(52) It is interesting that it is indeed truc that John likes Mary. - _ right branching 
(53) #'That that Jahn likes Mary is indced true is. interesting, 
(54) #John’s liking Mary's being indeed truc is interesting. - 


2.1.1 Kuno’s Account 


Why do clauses tend to branch toward the left in Japanese and toward the right in English? Although there 
are no known explanations, Kuno [Kuno72] [Kuno74] ptovides a very attractive functional account of a 
related phenomenon: [Greenberg63} noticed that vso™¥ languages: are prepositional (right branching) and 
SOV are usually’ postpositional ‘(feft branching). eine se account does not apply in SVO languages like 
English.)>7 


(55) Universal 3: Ianguages with dominate VSO order are always prepositional. 
(56) Universal 4: With overwhelmingly greater than chance frequency, ae with normal SOV 
order are postpositional. {Greenberg63] : 


Kuno observed that Greenberg's principles happen to be optimal; if a _Janguage violated Greenbere’s 
principles then it would be more prone to center-embedding and consequently more difficult to process. 
Consider the case of relative clauses. Kuno observed that relative clauses should precede the head noun 
phrase in SOV languages and follow the head, in. VSO language ia order-to avoid center-embedding. . This is 
very casy to demonstrate. Examples (57) and (58) obey Kune’s observation. and avoid centcr-embedding; all 
Violations do in fact center-emibed as (59) and (60) illustrate,*8 


(57) [Sq 05 V9 that] S; 0} Vy not center embedded 


(59) S; [that $, 0, V3}0; Vy | center embedded 


36. §. Y and O stand for subject, verb and object. A VSO language has the Presomingal word order: verb, ai 
object. an 

37. [Frazier80} generalizes Kuno’s argument to apply to SOV language, though her ‘ssuMAptions are somewhat | more 
open to dispute. 

38. Recall that a center-embedded clause has lexical material on both sides of it, In this case,. the center-embedded 


clauses are surrounded by an Sand an OQ, 
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Furthermore, notice the complementizer that falls between the head noun phrase: and the. rclative clause. 
This also happens to avoid center-embedding. ‘The alternative would brackct the relative clause between the 
head noun phrase and the comptementivcr, forcing center-embcdding. ‘Hence, complementizers will precede 
relative clauses in VSO languages (universat 3) and follow refative Clauses in SOV tanguages (universal 4). By 
avoiding center-embedding inthis way, we have converged’ on sore of See: principles. ko 
shows how this reasoning can be applied to some other ‘constructions. 


This does not explain why languages are: this way, but it is an,attractive account which should motivate further 
research to verify Greenberg's empirical observations. Furthermore, Kuno’s. argument has no left/right 
asymmetry; only center-embedding is considered costly. It seems that center-embedding is universally 


difficult whercas left/right biases are language specific consequences of the'center-embedding universal. 
2.2 Closure Specifications " 


We will assume the stack depth should be correlated with the depth of center-embedding. It is up to the 
forgetting procedure to close-phirases and remove them from: the stack,’ so only center-embedded phrases will 
be left on the stack. ‘The procedtire coutd err in cither of two directions: it'could be overly ruthless, cleaning 
out a node (phrase) which will later turn out to be useful, of it éoutd’ be overly conservative, allowing its 
limited memory to be congested with unnecessary information. af dither case, ‘the a will run into trouble, 
finding the sentence unacceptable. We have defined the two types of errors below. We will argue that 


Kimball's Early Closure is premature and Frazicr’s Late Closure is ineffective. 


(61) Premature Closure: ‘The forgetting procedure piematirey femoves phrases that turn out to be 
necessary. 


(62) Ineffective Closure: The forgetting procedure docs not remove cnough phrases, eventually 
overflowing the limited memory. 


39. These definitions happen to have a functional flavor. We use the functional notion "machine" interchangeably with 
the algebraic notion "grammar". Our definitions should not be taken literally; we do not mean to imply that there is a 
forgetting procedure in the brain just because it might. be convenicnt.. We are merely suggesting Uhat a forgedting 
procedure is a useful metaphor for modeling the computation that takes place. ; 
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2.3 Kimball's Karly Closure 


‘The bracketed interpretations of (63)-(65) are unacceptable even though. they are grammatical. Presumably, 
the root matrix” was “closed off" before. the final, phrase, so tat the alternative attachment was never 
considered. Kimball is crucially assuming that closure is possible before, the daughters themselves have been 
completely parsed. Imagine that a node corresponds to a.collection of pointers to its daughicrs; it is finished 
when all of the pointers are connected. ‘This docs not require that the daughters themselves be finished. For 
example, the node [, Joc figured [?]] is finished when a pointer is established 'to the node MI even though the 


contents of [2] remain to be discovered. 


(63) #Joc figured [that Susan wanted to take the train to New York] out. 
(64) #1 met [the boy whom Sam took to the park]'s friend. 
(65) #'The girl; applied for the jobs [that was attractive]. 


Closure blocks high attachments in sentences like (63)-(65) by removing the root, node from the stack long 
before the last phrase is parsed. For example, it would close the root clause just before that in (67) and who in 
(68) because the nodes comp that] and [comp Whe] are not immediate constituents of the root, ‘The root 
clauses would be frozen in the following configurations: [Tom said s-}"! in (67) and [Joc looked NP] in (68). 
Having closed the root, it shouldn’t be possible to reference it again, In. particular, nothing cls¢.can attach 
directly to the root. ‘This model inherently assumes that memory is costly and presumably fairly limited. 


Otherwise, there wouldn't be a motivation for closing off phrases. 


(66) Kimball’s Early Closure: A phrase is closed as soon as possible, i.c., unless the next node parsed is 
an immediate constituent of that phrase. [Kimball73] 


(67) [, ‘Tom said 
fe that Bill had taken the cleaning out ... yesterday 
(68) [, Joc looked the friend 


40. A matrix is roughly equivalent to a phrase or a clause. A matrix is a frame with slots for a mother and several 
daughters. The root matrix is the highest clause. 

41. We use an x-bar Jackendoff77] notation. s- (x bar) dominates » in embedded clauses (s- -> comp s). It is also 
iMportant to notice that the s in [Tom said S-Jis not completely fireisteed: it is S possible to attach material to the embedded 
x but not to the closed root. 

42. Kimball's closure is premature in these examples since it is possible to interpret yesterday attaching high as in: Tom 
said [that Bill had taken the cleaning out] yesterday. 
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[,. who had smashed his now car ... up 


There is a slight problem with Kimball’s formulation which will become important when we propose our own 
proposal. ‘The unless clause should have a second condition to block closure until a phrase has all of its 
obligatory daughters. For example, taking Kimball's definition literally, si ‘The boy Sy -..]) should close 
before whu in (69) because who is not an immediate constituent of Ss). ‘This would be a mistake because: s} 
does not yet have a verb phrase. Closure should wait for all the obligatory daughters. lor example, an s has 
two obligatory daughtcrs: a noun phrase anda verb phrase. Conscquently, s) cannot close before who 


because it doesn’t have its obligatory verb phrase. 


(69) Fl The boy b who the teacher always liked best} did something really awful.) 


2.3.1 A Counter- Example 


Although Kimball's motivation to save stack space is well-founded, the precise formulation makes an 
incorrect prediction. If the upper matrix is really closed off, then it shguldn't be possible to attach anything 
to it. Yet (70)-(71) form a minimal pair where the final constituent attaches low in onc case, as Kimball would 
predict, but high in the other, thus providing a counter-example 6 Kimball's story. Evidently, the root was 
closed prematurely. in (71) because it is pessibic to-attach.a gotten driver to it: 


(70) I called [the guy who smashed my brand new car up]. low attachment 
(71) [called [the guy who smashed my brand new car] a rotten driver. high attachment 


43. A-scan take a number of optional adjuncts and conjuncts. 

44. We have a methodological suspicion about any theory which predicts an unexpected asymmetry. Kinibalf's 
principles (as stated in [Kimbal73}) have iwo such asymmetries: his model is both top-down and right associative. It 
happens that his predictions are basically correct for a right branching language fike English. but not for a left branching 
language such as Japanese [Cowper76. pp. 29-31]. Kimball's principles conflate several phenumena, involving both 
closure and language type. It ought to be possible to describe the closure phenomenon independently of word order. An 
idgal explanation would not distinguish between left and right, because sop kaigakiges are, lef, branching and-some are 
right branching. This is really a rather minor point though; restating the: facts in this way should puse no: particular 
problems, 
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Kimball would probably not interpret his closure strategy as literally as we have. Unfortunately computer 
models are brutally literal. Although there is considerable content to Kimball's proposal (closing before 
memory overflows), the precise formulation has some flaws. We will reformulate the basic notion along with 


some ideas proposed by Frazier. 
2.4 Frazier’s Late Closure 


Suppose that the upper matrix is not closed off, as Kimball suggested, but rather, temporarily out of view. 
Imagine that only the lowest matrix is available at any given moment, and that the higher matrices are stacked 


up. The decision then becomes whether to attach to the current matrix or to close it off, making the next 


higher matrix available. The strategy attaches as low as possible; it will attach high if all the lower 
attachments are impossible. Kimball's strategy, on the other hand, prevents higher attachments by closing off 
the higher matrices as soon as possible. In (70), according to Frazier’s late closure, up can attach*> to the 


lower matrix, so it does; whereas in (71), @ rotten driver cannot attach low, so the lower matrix is closed off, 


allowing the next higher attachment. Frazier calls this strategy late closure because lower nodes (matrices) are 
closed as late us possible, after all the lower attachments have been tried. She contrasts her approach with 


Kimball's carly closure, where the higher matrices are closed very early, before the lower matrices are done. 


(72) Frazier’s Late Closure: When possible, attach incoming material into the clause or phrase 


currently being parsed. 


2.4.1 A Problem: Right Branching 


Late Closure is an improvement because it does not close prematurely like Early Closure. Unfortunately, it is 
too conservative, allowing nodes to remain open (not closed) too long, congesting valuable stack space. Our 
compromise will modify Frazicr’s strategy enabling higher clauses to be closed carlicr under certain marked 
conditions. As late closure is defined by Frazier, right branching structures such as (73) and (74) are a real 


problem. 


45. Deciding whether a node can or cannot attach is a difficult question which must be addressed. YAP uses the 
functional structure [Kaplan and Bresnan80} and the phrase structure rules. For now we will have to appeal to the 
readers intuitions. 
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(73) ‘This is the dog that chased the cat that ran after the rat that atc the cheesc.that you feft:in the trap 
that Mary bought at the store that .., 


-(74) l consider every candidate likely to-be considered saassaka mee considered somewhat less than 
honest toward the people who ... 


‘The problem is that the machine will ‘eventually fill’ up with unfinished matrices, unable to close anything 
because it hasn't reached the bottom right-most clause, Hence it will find these right branching sentences just 
as unacceptable as center-embedding. Perhaps Kimball s suggestion is premature, but Frazier’ s is ineffective. 
“The compromise solution will strongly resemble ‘Fravier’s late closure strategy except there will be one 


marked case of carly closure to handle right branching structures. 


Our argument is like all complexity Arguments; it considers. the limiting, behavior as the number of clauses 
increase. Certainly there are numerous “other factors which decide borderline cases (3- -deep center-embedded 
clauses for example). We have specifically avoided borderline. cases because judgments are 80 difficult and 
variable; the limiting behavior is much sharper. In these limiting cases, though, there can be no doubt that 
memory limitations are relevant to parsing strategies. In particular, alternatives cannot explain why there are 
no acceptable sentences with 20-decp center embedded clauses. The only reason is that memory is limited; 
see [Chomsk y59a,b}, [Bar-Hillel61] and [Langendocen75] for the mathematical-arguments. 


‘ ere oe 


24.2 Analogies from L1{k) and. LR¢k) Algorithms 


It would hetp to abstract the ‘cfasure problem in terms of formal parsing algorithms. Among deterministic 
parsing algorithms, |.1.(k) parsing correspends-to the carlicst possible: closing whereas LR(k) corresponds to 
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closing at the latest possible moment” Jn L1.(k), the machine decides to close before any of the daughters 
have been attached, whereas an [.R(k) parser decides to close after all of the daughters have been attached. 
Kimball's scheme is not quite as ruthless as I.1.(k); his parser closes after all but the last daughter has been 
attached. HF razier’s scheme is remarkably similar to L-R(k), where the closure decisions are made at the last 
possible moment. Early closing schemes tend to be premature; they cannot parse as many constructions as 
later closing schemes. In particular, [.1.(k) cannot parse left recursive expressions. Later closing schemes tend 
to be ineffective, wasting memory. An LR(k) parser will push all the input onto the stack in the worst case 
(right branching).4” Closing carly reduces the parser’s capabilities whereas closing late increases the inemory 
costs.8 

It might be noted here that Marcus’ parser actually behaves very much like an [-R(k) parser in this respect,*? 
and hence, like Fravier’s scheme”” ‘That is, it pushes the entire right-most branch (from the root to the most 
recently read word) onto the stack, so that it never prematurely closes a node as an 11k) parser docs; on the 


other hand, it will often waste stack space like an |.R(k) parser. 


46. Recall that both [E1(k) and LR(k) parse CF grammars on a deterministic stack machine (DPDA). 1.L.(k) is purely 
top-down: the machine decides which production to expand (push onto the stack) given the mother and the next & input 
symbols. The stack is popped when the next input syinbol matches the top of the stack, This discovers the feft-most 
derivauion (for ambiguous grammars). 1R(k) is the duak the machine decides which production to reduce (pop off the 
stack) given the next A input symbols and the previous state. Input symbols are pushed onto the stack when there are no 
productions to reduce. ER(k) finds the right-most derivation, 1LL(k) is a predictive parser because it predicts expansions 
top-down: ER(kK) ts a shifiereduce parser because it decides whether to shift (push an input symbol) or to reduce (pop a 
production off the stack). 

11(k) are optimal for purely right branching structures; the stack grows infinitely on left branching structures (doesn’t 
hilt), and finearly with the depth for center embedding. but is bounded on right branching. LR (k) parsing is the dual: it is 
optimal for purch left branching structures where the stack depth is bounded. On center and right branching, the stack 
depth grows tinearly. th(kK) is notas general as LR(k). but itis more space efficient whee it works. In our terms, LEK) 
parsers suffer from premature closure whereas LR(k) parsers may require more memory (ineffective closure). 

47. Hememory ts cheap then LR(k) is very attractive. Currently several computer programming languages are parsed 
with ERK) techniques because the memory demands are tolerable, We are assuming that human short term memory 
limitations are far too severe for such extravagances. 

48. [Iimibull7§] makes a similar pomt. Ue offers two compromise points between EE(k) and 1.R(k) and shows that the 
corresponding ltnguages are all properly nested. (Both compromises appear to be premature: the arguments are not very 
interesting.) 

49. Marcus himself has argued this point on many occasions (personal communication). 

SQ. Marcus had not been thinking about the clusure issue. Nevertheless, his work forms an interesting data point among 
the possible closure strategies. 
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2.5 A Compromise 


We have designed YAP to close late by default (like 1LR(k), [Prazier79] and {Marcus79}) with one marked 
exception to alleviate the memory. load (in, the right: branchiag gase):?!: The marked case of carly closure is 
described by the 4-over-A carly closure principle. It is very much like Kimball s carly closure principle except 
‘that it waits for two nodes, hot just one. For cxample iit 75}, Gur principle would close [, that Bill said $5] 


just before the that in S, whereas Kimball's scheme would closc it just before the that in S>. 


(75) John said [] that Bill said h that Sam said [, that Jack ... 


(76) ‘The Acover-A early closure princiats: Given twa, eae in, Fee same category (c.g. noun phrase, 
verb phrase, clause, ctc.), the higher closes when. 1 both amg, Sligible. f for Kimball clogure. That. is, 
(1) both nodes are in the me category, (2) the next node pane is not. an immediate constituent 
of cither, and (3) the mother™ and all obligatory daughters have ‘been “attached to both nodes. 


This principle, which is more aggressive than Frazier late closure, ‘enables the parser to process unbounded 
right recursion within a bounded stack by constantly closing off. However, it is not nearly as ruthless as 
Kimball's carly closure, because it ,waits for two. nodes,.which, may alleviate the problems: that Frazier 
observed with Kimball's strategy. . 


There are some questions about the borderline cascs where judgments are e extremely variable. Although the 


A- -over-A carly closure principle makes very sharp distinctions, borderline gases are often questionable. See 
[Cowper76] for an amazing collection of subtle judgments that confound every proposal yet made. However, 
we think that the A-over-A notion is a step in the right direction; it has the desired limiting “behavior, 


although the borderline cases are not yet understood. Chomsky comes to a similar conclusion: 


SI. Farly closure is very similar to a compiler optimization culled. ail. proursion. which converts right recursive 
expressions into iterative ones, thus optimizing, shack USAR, . Compilers. would perfarnyoptiniizations only when they can 
prove that the structure is right recursive; the A A-uver-A closure amiga somewhat heuristic because the structure may 
turn oul to be cente embedded, , 

$2. A node can't clase until it knows who ils mother i is. This i is ietiuaaél. bucause. iis, sieibll in YAP ty build nodes 
bottom-up. They might have jd their daughters, but nol their mother, Scoundly, we assume the root docsn’t have a 
mother and hence it cannot close. This will have sonw inapteean laplicadigas as. we. will sec. 

53. Notice that an A-over-A-over- “A ‘principle , would aio have the supe limiting behavior, In general, if there are a 
categories, an A-oyer-A_ principle .would limit. the stack depth to .2n. (in. the, sight. branching case) whereas an 
A-over-A-over-A principle would limit the depth to 3u. The difference (between 2 ang,3) is a constant which cannot be 
distinguished by a complexily argument of this sort. It is an-cmpizical question which is:preterable. 
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(77) “Obviously, acceptability will be a matter of degree, along various dimensions. Onc could go on 
to propose various operational tests to specify the notion more precisely (for example, rapidity, 
correctness, and uniformity of recall and ‘recognition: ‘normalcy of intonation). For present 
purposes, it. is unnocessary to delimit it more carefully.” [Chomsky65:pp.' 10] | 


We are still experimenting with the YAP system, looking for a more complete solution to the closure puzzle. 
2.5.1 Predictions 


Many of Frazicr’s observations also apply here because YAP closes late by default as in her model (except for 
the A-over-A early closure principle). As long as A“over-A carly closure dodsn’t apply, YAP behaves just like 
Frazier's model. ‘In particular, both’ Fraziér’s fate’ closure and YAP are not premature, unlike Kimball’ s 
scheme. Consider the ' ‘counter-cxample" to Kimball’ S carly closure: 


(78) I called the guy [who smashed mycar.,.up]} 
(79) I called the guy [who smashed my car] ... a rotten driver. 


Kimball's scheme prematurely closes the root clause just before who which is not an immediate constituent of 
the root. That is, it prematurely decides the root looks like [, I called NP} regardicss of what follows who. As 
we have previously noted, when a@ rotten driver is finally reached, Kimball's scheme will be stuck. Frazier’s 
late closure is an improvement because it keeps the root open until a rotten driver is parsed, YAP behaves just 
like Frazier's model in this case, because the A-over-A carly closure principle docs not apply. Hence, YAP is 


not premature (at Ieast in this case). 


54. The A-over-A closure principle is an incremental forgetting procedure. One could imagine another type of forgetting 
procedure which waited until the system ranshort-un space ‘and onty then ii would search the stack for * “garbage”. (In 
some sense. Frazicr’s PPP avoids “shunting” until it is ninhing short of space. “Hence the PPP is effective, though the SSS 
is now stuck with the problem.) In this framework: the forgetting procedure ‘dels as a background process which 
“interrupts” the parser whenever space runs short. This interrupt approach is quile plausible though it poses a few 
problems. First. like a LISP garbage collector which abso waits untif the coniputér is: out of CONS space, it is not 
quasi-real-time (bounded amount of time between reading any two input symbols). This is a particularly undesirable 
property of LISP for real-time applications (like airline guidance systems) bédauSé the airplane might crash during a 
garbage collection. Secondly, interrupt: driven systems are extremely difficult to debug and verify because it is very 
difficult to replicate the same situation twice. Consequently, it would appear quite difficult to model real psychological 
dita within an interrupt framework. Thirdly. the interrupt mu hati is yel another device which must be stipulated. 
The incremental approach avoids alf uf these technical problems. °° 
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(80) John said [, that Bill said [5 that Sam said [, that 


YAP’s closure is more effective in the right branching case because A- over-A early closure will apply. For 
example, pure late closure will eventually féad to a memory “overflow i in right branching sentences like (80). 
Pure fate closure will find right branching just as bad. as eenicrcmbedding, On the other hand, YAP’s early 
closure will constantly close nodes arly (befure: reading the entire ‘sentence). thus preventing a memory 
overflow. For example, it will close 5 7A; that Bill said spi ds soon as it attaches the last daughter to s> In 


sentences like (80), carly closure removes nodes just as fast. aS new ones are ‘being formed. In this case, YAP is 


effective (unlike Frazicr's late closure). 


(81) John said {; that Bill called the guy 9) that Sam said Af) that ... X 
(82) #lohn said h that Bil called the guy h that Sam said [3 that. || x 


There are some empirical consequences of closing carly. For example, nothing can attach to’a closed node. 
Hence it should be possible to test the A-over-A carly closure principle by noting whether or not nodcs closed 
under the principle actually do block further attachments. For example, in (81) once s , sclosed it shouldn't 
be possible attach ¥ to it as in (82). We will illustrate several types of X's: adjuncts, conjuncts and optional 


arguments. 

(83) John said FF that Bill called the guy ... yesterday. : - adjunct 
(84) John.said [, that Rilt:called the guy ... and: Sam called the girl. ea conjunct 
(85) John said [, that Bill caffed the gay a fitten driver, ~~” optional argument 


Closure principles merely state which attachments are possible; they do not spocify preferences. ‘There is 
considerable literature noting that X’s tend to attach as low as possible. A similar principle will be discussed 
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in chapter 4. It will be qualified to favor attachments to the lowest possible open node.>5 


‘There is a second testable prediction: no interpretive rule can, apply, inw a closed node. ‘That is, linguistic 
domains (command, ¢-command, f-command, ete.) have gaping holes where. phrases have been closed off. 
‘These holes are opaque islands to rulcs of bound anaphora. (reficxive and reciprocal), de quantifier scope, 8 
and reference (noncoreference), We will show that the. Progiction, appsars to hold for [asnik’s 
noncoreference rule. We will not discuss other interpretive rules here. 


2.5.2 Adjuncts 


The underlined phrases in (86) and (87) are called adjuncts, ‘They can generally attach to any open node along 


mbt! ed 


the right hand edge, thus accounting for the mulipe interpretations. (Admittedly, there is a strong 
preference for low attachments.) ee een en 


a8 g 


55. This will attach to the lower matrix even if the higher atlachasent is the only asiearinical ee ‘For example, 
(a) and (b) are marginal because the final phrase tends to allach ow, which i is upgrinumatical. : 


(a) 7H looked the guy who smashed my car up. 
(b) ?Put the block which ison the box on the lable. 


It seems that this is the cufrect prediction in the unmarked case: the acceptability might improve if the parser could be 
given some helpful hints (punctuation or intonation breaks) to block the low attachment. Unfortunately, this account 
incorrectly predicts that the sentence will become more acceptable if there is a second argument for the higher matrix as 
in: #7 looked the guy [who smashed my car up] up. The second up cannot attach to the embedded clause, so it should 
attach to the higher matrix. fulfilling the granmmiaticality requirements. Unfortunately, the sentence is much worse with 
the second wp. This is a serious problem for the current approach. 
56. Interpretive rules, such as Lasnik’s Noncoreference rule, apply over limited domains of the parse tree. We have 
already defined command: c-command and frcommand are slight variations. Conimand is defined in terms of clauscs, 
¢-commaad in terms of constituents. and command in terms of functional structure (chapter 5). 
$7. reflexive: They hil shentyselves. 

reciprocal. They hit each other. 
58. (a) Fveryone in this room speiks at feast two languages. 

(b) At feast two languages are spoken by everyone in this room. 


Sentence (a) has so-called wide interpretation (for all people there are two languages such that cach person speaks them); 
sentence (b) has aarrew scope (there are two languages such that everyone speaks them). See [VanLchn78]. 
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(86) John said that Bill did it yesterday. ae 
John said [that Bill did it yesterday]; i) PEAS ots a - | low attachment 


John said = ahead aide 3G ans We ig oty . high attachment . 
(87) John said that Bill did it fo erica. 

John said [that Bill did it to get ahead]. low attachment 

John said {that Bill did it] to get ahead. high attachment 


The interesting claim is that adjuncts cannot attach to closed nodes. For example, yesterday can not altach to 
s, in sentences fike (88) because A-over-A carly closure’ would apply first. 
; pee 33° F ee ie 713 age SSR DP eA Es a 


(88) John said Fi that Bill said that Sam said ... that Jack did it yesterday. 


Although this sccmas to, be the case, it.is yery. hard fo. test the,constitucacy relations with. time adverbials like 
yesterday. Purpose adjuncts (such as fo get ahead) suggest a much sharper test. Notice that (89) and ra have 
different inderstood subjects, teflectitig the aifferent’ consist Felations. 


(89) John said [that Bill did it (for Bill) to get ahead). 
(90) John said {that Bill did it] (or John) to get ahead. 


Itis possible to test the cogstituency rclations indirectly using well- known facts about subjects. For example, 


(91)- (92) are unambiguous; the alternative constituency relations (93)- (94) a are “ungrammatical since they 


violate binding conditions on reflexives. 


(91) ‘They said [that Bill did it to get himself out of hot water]. | 
(92) The said [tht BiH did k}to-get! thomatves ott of hot water. 


(93) *They said that fi Bill did it] to get hielo of hot water. 
(94) *They said [that Bill did it to get themselves out of hot water]. 


Now, it should be possible to test whether a node is closed. The purpose adjunct in sentences (95)-(98) must 
attach to s I But this will be unacceptable when 5; is closed as in (98). As usual, the borderline cases (96)-(97) 


are somewhat marginal. 
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(95) Did you hear [, they did it to get sherhecivey’e out ofhot water? 

(96) 2Did you hear [) they said that Bill did it to gct themselves outiof hotwater? 

(97) 221Did you hear [, they said that Bill said that Sam did it to get themselves out of hot water? 

(98) # Did you hear [, they said that Bill said that Sam said that Jack did it to vate themselves out of hot 
water? Ne eee Bg og 


2.5.3 Conjuncts 


There is a similar argument using conjuncts instead of, adjuncts, Assuring that. closed nodes cannot be 
conjoined, conjunction should become more and more difficult in (99)- as since s, is more and more likely 


to close carly, 


(99) Tsaw a boy {, who dropped the delicate modet ape anid-who _ it up and ia to cry: 


(100) 21 saw a boy [) who dianped the delicate ciedel silane he, had so atehiliy been making at the 
school]] and who picked it up and began to cry. 


(101) 721 saw a boy li who dropped the delicate model airplane | he had s $0 carefully been making at 
the school [; where I went [4 when } was young}[]] and who picked itup and begantocry. 


The claim that conjunction is limited to open nodes is ; also useful in pain” Suppose that we had an 
algorithm for deciding closure. Then we would know exactly which conjunctions are possible because 
conjunction is permitted between open nodes of the same caicuoty ‘This considerably reduces the 
combinatoric explosion of possibilities that has made it 9 taublesame. to. parse conjunctions. It is an 
interesting fact that conjunctions, at least their-aereptable iatosprosations,.are never, more: than a.few ways 
ambiguous, even though non-deterministic pores ah as A r es can ro ne ee a number of abaud 


possibilities. 


59. Some open nodes may not permit conjunction because they are stacked up and hence out of view until the lower 
possibilities fail. The preference for the lowest open node will be discussed in chapter 4. 

60. There are some grammiaticality constraints on conjunction which further restrict the possibilities which become 
important to chapter 9, 
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2.5.4 Optional Arguments 


There is a third argument supporting carly closure. Unfortunately the data are extremely controversial®! and 
there may be several alternative accounts for the facts. It would not be disastrous for the A-over-A carly 

AES ; f 
closure principle if the facts happen to fall the other way. Nevertheless, we will give the argument because it 


illustrates the approach, even though the evidence is not as clear as it might. be. . 


(162)-(105) test whether s is open or closed. Ifit is closed, ‘the optional argument a rotten driver cannot attach 
and hence the sentence should be unacceptable. This accounts for tte judgments in the extreme cases; 5, is 
open in (102) and hence (102) is acceptable, whereas Sy) is closed in (05), ‘and hence (105) i is unacceptable. As 


usual, the borderline cases (103)-(104) are marginal. ‘The A-over-A carly closure principle happens to exclude 


these marginal cases here; this should not be taken too literally. 


(102) Did you hear f that I called the guy [, who smashed the cara ratten en driver? 


(103) ?19id you hear [, that | called the guy [) who smashed the car [; that I bought last ov a rotten 
driver? 


(104) 7?Did you hear that [ called the guy who smashed the car [; that [ bought last ycar [4 just.after the 
old one needed a new transmission]] a rotten driver? 


(105) #4)id you hear that I callod'the guy who sttiasticd the car [; that f boditit last year ly just after the 
old one nceded a new transmission [5 which would: have cost $2009] a ronen driver? 


This account crucially depends on the optionality of the argument a alien diver if it were obligatory, then it 
wouldn't be possible to close.s, until.a secand argumeat te call is found. Ard henoe, carly closure would nat 
be an adequate account of the data beeause it cahnot.apply to the crucial matrix's ‘Phere is-satne evidence 
that @ rotten driver is optional; our informants report that Ope is much posal without the Ana) phrase. (This 


judgment is controversial ) 


61. Berwick (personal communication) reports different judgments when the crucial ol eae) were nara Our own 
informants were given written texts, Buth expériments were fun) Beg 
62. A very plausible alternative is that cal/ is lexically ambiguous; there is a verb call NP as in | called John and there is 
another verb call NP NP as in [ called him a name. Assuming that a clause can't be closed until its predicate has been 
disambiguated, carly closure cannot apply to the crucial rvitrix containing the verb call, And hence, the data may not be 
relevant to the closure issue. One could take this argument to an extreme and say that all verbs are lexically ambiguous 
and cannot be disambiguated until the clause is completely parsed, and hence, early closure would always be blocked. 
Then it isn’t clear how right branching sentences could be parsed. The lexical ambiguity argument is very tricky. 
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(106) Did you hear that I called the guy who smashed the car that | bought last year just after the old 
one needed a new transmission which would have cost $100? 


2.5.5 Noncoreference 


Lasnik’s noncoreference rule [lasnik76] is another source of evidence. Previously we showed that 
noncoreference in sentences like (107)-(109) was less and less likely to apply. In this subsection, we will claim 
that noncoreferential links cannot be established into a clased node. Again, the extreme cases are much 
sharper than the borderline. Noncoreference is clearly established in (107) where the crucial clause is. open. 


‘The judgments become less and less sharp as s; is less and Iess likely to be open. 


(107) *Did you hear [; that John; told the teacher that John; threw the first punch? 

(108) ?Did you hear [ I that John, told the teacher [ that Bill said that John; threw the first punch? 

(109) Did you hear [} that John; told the teacher b that Bill said h that Sa thought [, that John, threw 
the first punch? 


2.5.6 Root Clauses 


‘The A-over-A closure principle (unlike Kimball's account) predicts that root clauses have a special status with 
respect to closure. ‘The root clause will aever close because it can't have a mother. In particular, this suggests 


that it is always possible to conjoin to the root no matter how many clauses intervene. 


(110) I saw a boy who dropped. the delicate model airplane he had so carefully becn making at the 
school where.J went when. | was young and you'saw a girl:-do the same. 


Similarly, this predicts that root clauses can always take adjuncts. However, it docs not predict that optional 
arguments can attach to the root because they are dominated by a verb phrase which docs have a mother. 


Hence, the verb phrase could close carly, blocking optional arguments from attaching. 


(111) #Joc vp figured [that Susan wanted to take the train to New York}... out. 
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2.6 Summary 


In conclusion, we have argued that a memory limitation reduces. the, overall time.and space requirements (by 
fiat); the competence model alone. cannot achieve such tight, bounds, Although :it is very difficult to discover 
the exact memory allocation procedure, it sccms that the closurc,.phenomenpn offers an interesting, set of 
evidence. There are basically two extreane closure models.,jn, the Jitcraturg.. Kimball's carly closure and 
Frazicr’s late closure. We have argued for a compromise position, the A-over-A.catly.closurc principle, which 
shares many advances of both previous proposals. without sang of the attcndant disadvantages, Our principle 
is not without its own problems; the borderline cases are extremely difficult. It seems that there is 
considcrable work to be done, ; 
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3. Constituent Structure Implementation 


YAP's scarches for a mapping from a string of words to 4 set of granimatical relations (subject, object, etc.) In 
the Bresnan-Kaptan system [Kaplan and Bresniang6], the ‘resulting ‘grattirnatical relations’ form a functional 
Structure (fstructure). “Phere ts ati’ hitetnediate’ representation ‘called’ the constituent structure (cstructure) 
which captures structural tefations (mother: diughtér, sister’ dtc). “te system has an algorithmic procedure 
for building the fstructure from the estructuré: he chapter § wil destribe how VAP docs this; this chapter is 
mainly concerned with bttilding the cstructure in'the first places 


The cstructure has similar counterparts in most other linguistic representations. For example, 
transformational grammar starts with a sct of CF base rules and then applics a number of transformations. 
Similarly, ATNs start with recursive transition networks (R'TNs) which are CF equivalent and then add a 
number of augmentations. Bresnan’s cstructure is a CF description. ‘The mapping from the cstructure to the 


fstructure is analogous to transformations in ‘IG and augmentations in AT'Ns. 


There are interesting differences between all these systems; we have adopted the Bresnan-Kaplan framework 
because it seems casier (to us) tv map it into a practical FS deterministic parser. Even if TG, ATNs, and the 
Bresnan-Kaplan framework are all notational variants of onc another (which is unlikely), the Bresnan-Kaplan 


framework might be morc useful for our purposes since it is not obvious how to encode the other models into 


a FS deterministic parser. 


63. There have been many articles relating ATNs to processing stratcgics [Kaplan72] [Wanner and Maratsos78] 
(Bresnan78}. All of these require more resources (memory and backup) than we are willing to allocate. Their approach 
appears to be very dificult: although there was great hope in the carly papers, it is very difficult to make further progress. 
McDonald (personal communication) has pointed out that traditional ATNs are analogous to PLANNER [Hewitt72]; 
both replace knowledge with brite force automatic backup. More recent Al problem solving languages (c.g. TMS 
[Doyle78)) replace notions like automatic backup with dependency directed backup. We sce the same trend in language 
processing (c.g. GSP [Kaplan75]) though there are many details to be solved. We have avoided many of these difficult 
problems by stipulating FS and Determinism. It seems that the Bresnan-Kaplan framework is more compatible with 
these stipulations than more general frameworks (which permit nun-deterministic side-cffects), though it would be 
difficult to formalize this intuition. 
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(112) Lama boy. 
(113) fg fup- Ep fy 21 ap- fap Lace I fn Povllll 


This chapter will discuss how YAP builds the cstructure. ‘The problem is to map a sentence like (112) into a 
structure like (113). The cstructure is a tree™ of phrases (nodes). Phrases are delimited by square 
brackets ([ p> labeled with a category (part of speech). A category has two parts: a major categorial feature 
(n, ¥, a, p) and a “bar™ level. YAP uses a three-bar system; there are nouns (n), ritnin phrases (np) and np 
bars(np-). Similarly there are verbs (v), verb phrases (vp), and participial phrases (vp-).7 In all, YAP has 4 
major categorical features which have three bar levels, forming 12 categories. ‘There are 6 other categories: s, 


s-, det, comp, conj and punct.® 
3.1 ‘The Machine State 


YAP has four components taken almost directly fram Marcus’ Parsifal: 


(114) the input stream _ the input stream 

(115) the upper buffer the stack - 

(116) the tower buffer "the lookahead buffer’ 

(117) a deterministic FS control device a grammar of aaa tules - 

A snapshot of the machine is shown i in figure 1. The string * = =WAl L==" is is printed between the upper 


and lower buffers. Buffer cells are filled with nodes, (parsed. phrases) which arc ptinted in square 
brackets ({)). 69 Both buffers grow in toward the WALL. as the, machine  PArSCS | toward the center (the WALI ) 
from both directions (both top-down from the root and bottom up from. the input). ‘The upper buffer 
contains mothers which are building down to the WALI. and the lower buffer contains daughters building up 


64. This condition will be weakened to ercode structural ambiguity (pscudo-attachment). 

65. These are oflen called phrase markers (or P-markers) in the linguistic literature. 

66. n = noun: v = verb: a = adjective; p = preposition 

67. The “bar" system was first introduced in [(Chomsky70] to describe certain generalilics between noun phrases and 
clauses (i.c. John's having criticized the book and John has criticized the bogk), See Uackendoff77] for a more current 
reference. The term projection refers to the next higher bar level. For example, ap- is a projection of ap and np is a 
projection of a, The third bar level is the maxinel projection. an YAP). (Phere bewe: been proposals fur five bar systems.) 

68. s = sentence: sis aUprojection of s; det = determiner, comp = eumplomentizer (for: thas....): conj = conjunction; 

punct = punctuation. These categories don’t fil the bar pattern. very well. are 

69. Printing is.a very expensive:tank: it requires scarching thy. fringiscof the persed eubseces to find. the individual words. 
The parser itself is nut permiticd to undertake such expensive tasks: Zeshmieally the.printer és not part of the machine. 
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Fig. 1. A Snapshot 
sentence: I am a boy. ‘ 


input pointer: boy. 

up3 

up2 | | 

up] ~—s]) A. 3 aise 1 es the upper buffer 
==WALL== 

down] i p- 1] the lower buffer 

down2 [, am] 


down3 [got al : bd 


to the WALL. Nodes (parsed phrases, i.e. nonterminals) enter and exit near the WAL. in stack fashion (via 
push and pop operations). ‘That is, upl and downl are the ' “top” of their pete oullers “(otacks): New 


words (terminals) enter the lower buffer from the “bottom” (dowad}. 


YAP is deterministic and FS for reasons discard previdusly. The control device (117) is defi ned tu be 
deterministic. That is, from any machine state there i is exyetly one applicable graminar rules, backup. is 
absolutely excluded. ‘The FS limitation has.bcen implemented i in YAP by truncating the buffers to. 4: fixed. 
length and limiting the size of a buffer cell. The. bounds en the two buffers ‘have not yet been defined. ‘The: 
first three cells of cach buffer are referenced so frequently that it is convenient to name them up/, 

up2, ..., down3 as in figuée 1. In fact, the buffers may be longer. ‘The complexity arguments suggest that they 
should be limited, but it is not clear what the Timits should be.” Setting the exact limits (constants) would 
require considerable psychological experimentation. “The lent of the upper buffer measures the maximum 


allowable nee of center émbeddi ing. ‘The lower buffer n micasuires s the degree of lookahead. n 


ui 


70. The-bound on each buffer is a parumeter which cun‘be adjusted at runtime. 

7h. Chomsky (personal commiunigation) peints-out that bounded: leokahend might-be equivalent to. some sort of 
bounded backtracking. In which case. the lower buffer could be thought of. as measuting the degree ‘of backtracking. 
[Ultman65] discusses. lwo interesting formabizations of hounded agama swags Sobers is PPRODADY very 
similar to bounded backtrackiag and: bounded leokthead 666 oe 
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There are some interesting issues associated with fixing the size of nodes. In Parsifal, a node is literally a 
pointer to a subtree (parsed phrase). A YAP node is an abstraction of the releyant features, not the entire tree 
itself’? It is important to bound the size of a node in order to prevent encoding unbounded memory into the 
nodes.’3 This guarantees that any predicate can be tested i in a fixed amount of time. Parsifal stores subtrees 
in the stack cells; it could take an arbitrary amount of time wo search a subtree for some property (such as the 
value of a trace.)’4 Similarly, the formal system outlined in {Kaptan and Bresnan80] permits two unbounded 
nodes to be unified which also requires unbounded time.” Although this is a theoretical point, it does bring 
up some very difficult questions regarding abstraction (inheritance). Which features does a mother inherit 
from its daughters and which features are opaque to further inquiry? This question will be studied in 


chapter 5. 
3.2. Production Rules 


Only one more component is needed before the machine can run. We have to specify a procedure for 
determining which actions to apply next. We will begin by describing a very gencral technique. ‘The next few 
sections will present some more specific techniques which should cover the mest common: wnmarked cascs. 
The more gencral techniques will be uscd only in very marked cxéeptional situations. We introduce them 
first because it is relatively casy to sec that they are sufficiently powerful; however, they are so powerful that it 
is very difficult to combine them effectively into a good structured program (grammar). 


‘The gencral technique is to use a set of production mules (like Parsifal’s grammar rules) which uniquely 
determine the actions to perform. That is, the first applicable production mule is selected. We are strongly 
depending on Marcus’ Determinism Hypothesis; the rules cannot backup ar.sprout new processes like a 
non-deterministic machine. A sample rule might be: 


72. Actually the tree structure is maintained for the printer's convenience: the purser itself docs not look beyond a singte 
level of tree Structure, The parser is a FS (rainsducer ‘which’ inpms wordls amd oiitputs tree sirticture.. These structural 
links, which are maintained for the printer. should be viewed as part Of thie ontpint, Hot the intemal state. 

73. Ifa node could be arbitrarily’ large then it could encode ainytHibng. ‘Gott rencading i is it farticabatly extreme technique 
of accomplishing this undesirable conscytiéner, 

14. Traces are a formal linguistic object which will be discussed in clugiter 8: Pursifat attows traces to be bound to other 
traces and hence it may require unbounded time to retrieve a value front a long chain of traces. 

75. This _ Property provides | considerable computational power, “That system is capable of 7 Parsing: cs languages of the 


form: abe" [Kaplan (personal communication). 
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(118) (defrule attach-subj 
(pattern (=s)(=np- =vp)) 
(action (attach))) 


This rule would say, if there is an s (sentence) node just above the wall (up1) and there is an np- followed bya 
yp just below the wall (in down] and down2), then attach the np- (down!) to the s(upl). It is very similar to a 
CF rule of the form:”6 | 


(119) s -> np- vp... 


The pattern has a limited window; it can only reference the first three cells in cach buffer and features 


immediately attached to them. (120) lists the syntax for a pattern. ‘There are six predicates 


<upL, .... <down3> associated with upl, ..., down3, respectively.”” If the predicates and the fisp expression”® 


return true, then the pattern “matches”. 9 


(120) (pattern 
(<up3> <up2> <upL>) 
(<down1> <dawn2> <down3>) 
<lisp expression>)) 


76. CF rewrite rules are often viewed as top-down (generative): Fhis asymmetry is purely a matter of convention; they 
could have been formulated in a bottom-up fishion. Our representation is neutral with respect to top-down and 
bottom-up. 
77. (f the pattern contains less than three predicates, the specified predicates apply to the cells closest to (he WALL. For 
exaniple. (118) applies to up]. downl and dawn2 because up! is the closest to the WALL froni the upper buffer, and. 
down! & down2 are the closest fram the lower buffer... 
78. This lisp expression must be sidy-effect-free (cannot change the state of the miichine in any way). It is also 
constrained to the first Uyree cells in cach buffer and their immediate descendants (b conyention). 
79. Although it is useful to think of the Pattern nuatching, ys it Jincgr search, Ahey ¢ gtuually’ livverted (hashed) on <up1> 
and <down >. In practice, approximately seven patterns are tested before finding amatch, The: jest/maich ratio had been 
4:1 before certain rules were added. Theoretically it should be pussible to do guuch | better: the iest/match ratio should be 
2:1 or better. 

The matching procedure deserves much more altention. This may, he the proper place to incorporate lexical 
expectation and extremely subUe preference data, which are often taken as evidence for a backup mechanism. Since 
lookahead is analogous to backup, it ought to be possible to encode these facisin a fookisticad frariework. 
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vere 
grits tist 


actions. This s section will discuss three primitive actions: ana: eee is a athe basic scsratltina 
appear in most parscrs in one way or another. In aa AYN the cosntsponding uporations teaversc:an:arc, push 
to a new network, and; pop fram a nctwork, In Barley's. alguridan.[earey3O} the corresponding opcrations are. 
scan, predict and complete. Basicully, any tcc traversal algprithaa will-havei thee Corresponding operations: \ ° 


tidy 


( 121) move across from one sister to > the next . _ (attach, traverse a are, Scan} 


(122) move down from a mother to the first daughter - ree (predict, push) 
(123) move up from the fast daughter to thé mother Pa I EN RR ges: pop. complete) 


at 
‘These actions are implemented in terms of buffer operations. Recall that both buffers are building toward the 
WALL; the upper buffer holds mothers looking tup-duwti for daughitirs (onl'thesuther Side’ of the WALL) _ 
whereas the lower buffer holds daughters looking Bottom-up For Mothers. ‘(Tie grass Is dlways'grdenct on the 
other side of the WALL, so to speak.)® When a daughter and @ mother filnitty tfitd cach other (attach), the 
daughter is popped from the lower buffer because it is no longer Pek for ebltaiaas it is then pushed on onto 


the upper buffer, to enable it to find some of its own daughters.8! As we will sce, up! will inherit certain 
features (c.g. person, number, gender, ...) from down. to, reflect the attachment, : Fipally, the. mother. and: « 


Fig, 2. The Attuck Action Pe : i 
Attach pops down! from the lower buffer and es it into the ° upper buffer. 


sentence: I am a boy. 


input pointer: boy. 

before after 

[,] [, Hl 
==WALL== [np- I] 

[np- 4 = =WALL= 3 

[y am] [y am] hs) 
Lact al fact al 


80. In GSP terminology [Kaplan75}, the upper buffer holds producéts abd the lower buffer holds consumers. 
81. Daughters can be attached before they themselves are complete. “This is crucial for carly closure. 
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daughter are linked together in the output structure. This link is also available to the printer, and hence, up] 
prints differently after the attachment. For example in figure 2, up) is printed as ,]t before the attachment 
because it dominates no words, but afterwards it itt is sprinted as I, i} because it it dominates the word 1. 


‘The machine proceeds in a middie out fashion away from te WATE: First, it tries to attach down! to up! as 
we have just seen. If that fails,it startsa new nede beewcen pl anddownl. “This is’ the’ predict operation. 
For example, suppose that YAP finishes parsing the subjeet' ‘in ‘figute 2'by'sotne yet unspecificd means, 
leaving dhe machine state ready for the predict action as in figure 3. Up! contains a clause (fs i) looking fora 
vp daugliter and down! contains a verb [yam] looking for a ‘ ) mothe ‘The predict action starts a wp node 
between up] and downl to bridge the gap. ‘The machine can, Dow continuc. by attaching up! to down] just as, 
it did in figure 2. 

YAP will continue predicting and attaching ynti].it reaches,the state specified in figure 4; At this point, up| ts 
complete, ‘The machine , will close up! by, papping it from, tac. upper Ree a removing it from memory. 
It cannot take on any more daughters. | 


3.4 The Phrase Structure Component 


Marcus’ niachine has a number of production rules like (121) to decide which actions to perform. It would be 
possible to write-a complete grammar-in this form: H-we did-so; YAP-woutd took very much tike his machine. 
The problem with writing a grammar in production rules is that the performance and the cémpetence | 


Fig. 3. ‘The Predict Action 
Predict will start a new node between up! and downl. 


sentence: I am a boy. 


input pointer: . 
before after 
[0 [, 1 
=WALL== ==WALLS= | 
[, am] [yp] 
Lact al [, am] 
[, boy | leet 
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Fig. 4. The Close Action 
Close pops the upper buffer, thus removing up] from memory. Nothing more can attach to this np. 


sentence: | am a boy. 


input pointer: 

before after 

[, | ama boy] ‘[, Lama boy] . 
vp am a boy] (vp amaboy) . . 
Inp- a boy] ==WALL== 
==WALL== Lounct a | 

founct J 


components tend to become tangled: it is very difficult to write a good structured program (grammar) with 
such clementary building blocks. [Swartout78}, [Shipman78] and d Marcus? chapter 4] have observed that 
there are phrase structure (PS) rules hidden ‘inside Marcus’ grammar. Shipman then wrote a PS machine 
which used phrase structure rules to decide when to activate rules #2 it would be desirable if we could add 
phrase structure rules to YAP so that it could select the next.action in an orderly way. The phrase structure 
componcat should cover mestingrmal unmarked cases, production rules are reserved for mar ked exceptions. 
A typical YAP ‘phrase structure rulc is as follows (omitting details): 


(124) (def-ps-rule finite-s s 
(csubj obl (s- np-)) 
(chead ob! (vp))) 


This ps-rule is similar to the two CF rules:®3 


82. Marcus has a notion of active rules. For technicat poahdnncont haleo not mcopomaied his ide explicitly i in YAP. The 
notion is in fact implicitly encoded in nodes, (Each node knows whut di isdabking for) =. 
83. Technically, itis closer to the: a CF: rules. ames ioe nontis ‘(csubi, ‘chead.. ..): are clays 
non-branching. 
s -> csubj chead 
-esubj > s- 
“csubj -> np+- 
chead -> vp 
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(125) s -> np- vp 
(126) s -> s- vp 


In English the rule says that a finite 4 has two obligatory daughters; the first is the surface subject (csubj) and 
the second is the head (chead). The first can be cither an s- or np-, and the second, a vp. ‘This rule could have 
been written as a large number of marked production rules; the ps rules are more perspicuous. For example, 
a single ps-rule replaces ten of Marcus’ rules for parsing auxiliaries.85 Sce [Shipman78] for a translation 


procedure from ps-rules to Marcus’ production rules. 


Z 


There is a PS pointer associated with cach node to indicate what the node is “looking for". A PS pointer is 
written in dotted rule notation where the dot (.) marks which terms have been parsed (sce figure 5). ‘Mhe PS 
pointer is automatically advanced when a daughter is attached.as in.the figure.* 


YAP will use the PS rules to select the next action. When there are no applicable marked rules, the 
interpreter trics to apply the PS rules. There are three possible PS actions: psvattach, Peprcdet, and ps-close. 
In YAP they are implemented as follows:®” 


(127) ps-attach: If down] can attach to up], then do so. (figure 5) 
(128) ps-predict: If the category of upl’s next eee can be determined; then predict anode of that 

category. : fe (figure 6 top) 
(129) ps-close: If up] can be closed, then do so. (figure 6 bottom) 


84. A finite clause is tensed, as opposed to an infinitive or participial phrase. 


finite: lama boy. 
infinite: To be a boy is tough. 
participtal: Being » boy. | know how he must feel. 


Raged pst the barn. the horse felt like-getting even; 

85. Auxiliary verbs are “helping” verbs-suth us: be Auve, will-can, ri 

86. Parentheses () denote optional tesms, brckets 4} denote cxclusive denivaun and * is the Kiccne star far .arbitrary 
Tepetition. Brackets have very restrictive distribution since they are difficult to express within the. determinism 
framework. 

87. ps-predict has a top-down asymmetry which is very unfortunate. To compensate for this deficiency, ‘there are quite a 
number of production rules to predict bottom-up. The grammar would be much simpler if the ps-predict.j fule were more 
symmetric. 
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Vig. 5. PS Attach 
sentence: | am a boy. 
input pointer: boy. 


I, ] ‘finite-s -> . csubj chead 
==WALL== 

lnp- I} normal-x -> cword . 
[yam] normal-x -> cword . 
Lact a] normal-x -> cword . 


Because downl is a possible csubj for upl’s finite-s, the default PS- attach rule will attach downl to up], 
leaving the machine in the following state, Noticg. that, the, PS painter associated with the s node is 
automatically advanced. 


[, ] finite-s -> csubj . chead 

[np- J normal-x -> cword . 
==WALL== F 
[y am] normal-x -> cword . 

ldct al normal-x -> cword . 


All these rules are depended upon the ps pointers; the conditionals (can attach, can predict, and can close) are 
functions of the ps pointers. These rules are the defaults which can be over-ruled in the marked case by a 
production rule. By introducing these ps rules we have greatly reduced the number of marked: productions 
rules. ‘lhe current grammar has 12 ps-rules and 69 production rules. In practice, the ps-rules and production 
rules are executed about cqually often. The PS rules were designed 4a strongly resemble Bresnan-Kaplan’s. 
constituent structure component just as the production:nulesircgemblé Marcus’ grammar. 


3.5 Ordering PS Actions 


(130) attach 

(131) predict 

(132) close 

The ps rules have an unmarked order (130)-(132) whieh-can be overnuled by. a-marked production rule. In 
the unmarked case, first try to attach down] to up] If that doesn’t work out, then try to predict. Finally, try 
closing up. Empirically, this order scems to require a minimum number of marked rules. It favors attaching 


early (low) and closing late. [ate closure was discussed in chapter 2; carly attachment is the subject of 
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Fig. 6. PS Predict & PS Close 
PS Predict 

sentence: | ama boy. 

input pointer: . 


[54 finite-s -> csubj . chead 
==WALL== 

[, am] normal-x -> cword . 
Lact al normal-x -> cword . 

fa boy] normal-x -> cword . 


Since the category up!'s next daughter is unique (it must be a vp), the PS-predict rule will start a vp node in 
down], as illustrated below. 


[, finite-s -> csubj . chead 
==WALL== 

lvpl normal-vp -> . chead (cobj) (cxcomp) 
[, am] normal-x -> cword. 

Laer al normal-x -> cword . 

[, boy] normal-x -> cword . 


PS Close 

sentence: Lam a boy. 

input pointer: 

[, Tam a boy] finite-s -> csubj chead. 

[vp am a boy] normai-yp.-> chead (coabj) .(cxcomp). 
[np- a boy} normal-np- -> (cspec) chead . 

= =WALL= = 

[punct J normal-x -> cword . 


Since up! can close, the PS close oy-cration would pop it from the upper buffer, thus removing it from 
memory, so no further attachments can be considered. 


[, 1 am a boy] finite-s -> csubj chead . 
vp am a boy} normal-yp -> chead (cobj) . (excdmp) 
= =WALL= = mo 


fount! normal-x -> cword . 
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chapter 4. The rule ordering would attach X as low as possible in structures, like (133) because ps-attach 
precedes ps-close. (134)-(136) illustrate this for adjuncts, conjuncts and optional arguincnts, respectively. The 
next chapter will compare this approach with altematives in the literature, 


(133) John called the guy who called the girl who called ... X 


(134)-John called the, guy who calied the girl who called... yesterday. adjunct 
John called. the guy; who called the girl who.called ... to make {himsclf, herself} feel better. 


(135) John called the guy who called the girl who called ... and said “hello”. conjunct 


(136) John called the guy who called the girl who called... arotten driver. optional arguments 
John called the guy who called the girl who called...up. 0) 
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4, Attachment Strategies 


Pens 


What types of information should drive the attachment: process? ‘Tharp. are-,four basic strategics in the 


litcrature:88 

(137) Structural Bias riimbati?3, 154 {Frazier and F dort [Marcus79], YAP 
(138) Lexical Expectation/Arc-Ordering - [Fodor 8} [litestian 78% [Kaptan 72) \Marincr?8,79], 
(139) Length Bias {[F ravier 79): ‘Waiver arte’ Poder 78]. {Fedor and Hrazicr80] 
(140) Semantic Bias . a oo [Crain79] 


Although there are valid arguments for cach of these. Positions, we will concentrate on the structural biases in 
this chapter. YAP can encode the other biases using. marked rules. * The structural bias i is provided (in the 
unmarked case) by the proposed rule ordering (attach, predict, and then close). It appears very similar results 


are produced by Frazicr’s two principles: minimal attachment and jate closure. ‘This idea was inspired by 
[Wanncr79 pp. 12] which relates Frazier’s principles to certain ATN actions (traverse arc, push and pop) 


which are similar to our three primitive actions (attach, predict and close.) 


88. Few papers fit the categories perfectly. For example, we have listed the Sausage Machine in two places because it has 
some structural compunents (minimal attachment and kate closure) and some length biases (Preliminary Phrase Packager). 
Similarly, we could have listed the arc-ordering papers under several headings because arc-ordcring can encude many 
types of biascs. as [Wanner79] quite correctly notes. 

89. We have very little to say about length biases. Frazier’s machine has a front end called the Preliminary Phrase 
Packager (PPP) which segments the input stream into manageable chunks that are “shunted off" to the next higher stage 
(SSS). The PPP has severely lintited memory (about six words) and it has litle or no ability to communicate with the SSS 
except to “shunt” segmented phrases which it will (almost) never see again. This model makes the interesting prediction 
that preliminary segmentation is subject to length biases. 

There are a few problems with this propusal. First off, it is not clear how to build a PPP. Purcly bottom-up 
segmentation is extremely difficult in general. unless one is will to form all possible segments (which is probably not 
Frazier’s intent.) Secondly, although the length biases are certainly real at some level. Frazier’s suggestion that they play a 
major role in parsing is extremely controversial. For example, [Wanner79] observes that the length factor does not appear 
to alter the preferred interpretation in the following sentences. 


Tom said that Bill had taken it out yesterday. 
Tom said that Bill had taken it yesterday. 
Tom said that Bill took it yesterday. 

Tom said that Bill died yesterday. 


We will accept Wanner’s criticisms of the PPP and his alternative proposal (ordering the actions: attach, predict and then 
closc), The interested reader should investigate his paper for more discussion of the PPP and how it relates to his 
proposal. 
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(141) Minimal Attachment: Attach incoming material into the phrase marker being constructed using 
the fewest nodes consistent with the well-formedness rules of the language. 


(142) Late Closure: When possible, attach incoming material into the clause or phrase currently being 
parsed. [Frazicr79 pp. 76] 


(143) Minimal attachment = attach before predict 
(144) Late.closure = close aftcr predicting and attaching 


If the two analogies, (143) and (144), are correct, then the proposed unmarked ordering | of ps rules i isa valid 
implementation of Frazicr’s principles. Her principles were designed to capture a large number of 
performance phenomena, from a psychological point of view. We will address their feasibility from a 


practical engineering point of view. 
4.1 Minimal Attachment 
Minimal attachment prefers (146) and (149) because they have fewer brackets (nodes) 


(145) ‘The horse raced past the barn (fell). iti tens ([Frazicr79 pp. 27) 
(146) +[, [np The horse] (vp raced past the barn) ... fell 
(147) —[glnp [np The horse) [, [5 raced past the barn]]} [,, felll] 


(148) ‘Tom heard the latest gossip about the new. acighbars (wasn't true), (Similar to [Frazier79 pp. 155) 
(149) +Tom heard [the latest gossip about the new neighbors}. 
(150) —Tom heard [[the latest gossip about the new neighbors] wasn’t true]. 


4.1.1 Sensitivity to Phrase Structure Rules 


There is a technical problem with this formulation; minimal. attachment is extremcly sensitive to slight 
modifications in phrase structure rutes; it would be more robust if it: counted: limiting growth (like a 
complexity argument), not individual nodes. It is. not clear, for example, that her.counting. argument can be 
used to distinguish the following [Frazicr79 pp. 24]. 


90. It is useful to further distinguish the acceptable/unaceeptable continuum. The plus symbol (+ ) is uscd to indicate a 
more acceptable sentence; minus (—) indicates a less acceptable one, 
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(151) Sam hit [the girl] [with a book] ae | . high attachment 
(152) Sam hit [the girl with a book} So low attachment 


The first has one fewer node using her phrase structure rules: they have the same number in our analysis. 
These borderline cases are notoriously difficult; human judgments tend to be unreliable and indecisive. For 
example, [Wales and ‘Toner76] have found that certain ambiguous structures have Jittle or no bias; both 
possibilities are about equally probable. ‘This fact is:not captured by niest-altucliment strategies which draw 
very sharp distinctions. Certainly, both Frazier’s minimal attachment and our ordering criteria are guilty of 


this criticism. Later in this chapter, we will discuss a marked rule (pseudo-attachment) to cover the 


ambiguous case. 

(153) np the girl] [pp with a book] Frazier's analysis high 
(154) [np [np the girl] lop with a book]] | dow 
(155) [np- [np the girl]] lop- with a book} YAP's analysis high 
(156) lnp- [np the sitll top- with a book]} OM, heed low | 


4.1.2 Expkinations for Minimal Attachment 


Intuitively, the principle appears to conserve computational resourtes, although the argument has not been 
completely formalized. ‘[Wanner79] argues that it is generally’ more efficient to attach before predicting 
because predictions postulate an additional node which presumably involves ‘a certdin additional cost. Hence 
it is generally cheaper to order attach before predict. ’ This ordering happens to be consistent (more or ess) 


with Frazicr’s minimal attachment strategy. 


It is very difficult to formalize this argument. Although it is generally cheaper to attach before predicting, 
attaching first isn’t a/ways cheakper. ‘ For example, there are ‘structurally: ambiguous sentences such as 
(151)-(152) where attaching’ first is: no more efficient: Even if there wete n discrepancics ‘between the 
ordering criteria and Frazicr’s principles, it isn't clear which explains: which. Wocs the ordering criteria 
explain the minimal attachment principle or the other way afound? ‘Nevertheless, there is'an interesting 
correlation. Despite its problems, we will accept Wanner’s account as an implementation of minimal 


attachment (and leave the explanation question unresolved). 
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[Fodor and Frazicr80] suggest another explanation. Suppose the parser byilds "several" paths in parallel. 
The first one to finish “dominates subsequent processing". ‘This provides a nice motivation of minimal 
attachment: presumably the most minimal path woutd'finish the“race” first since it constructed the fewest 
number of nodes. Similarly, they could account for the ambiguous case as "a double finish" (although 
Frazier happens to argue that this particular case is unambiguous fFrazier79 pp: 143). 


One has to be careful with the parallel processing account. If it.is taken too literally (each derivation has its 
own processor), it would trivialize the attempts to limit backup/logkakead. (by substituting hardware for 
backup/lookahead). ‘There ought to be a mechanism for bounding parallelism just as there is a mechanism 
for bounding lookahead in Marcus-style parsers. (In some sense, backup, lookahead and parallelism are all 
very similar.) Kodor and Frazier’s account would be much more satisfying if they also discussed the 


limitations of the parallelism. 


It has been very difficult to find a deep explanation for the principle because it is heuristic (in our 
framework). ‘There are several cases where the ptinciple can-be overridden. For cxamyple, there are the 
“ambiguous cascs just mentioned. Also; it has been argued that semantic and: pragmatic. biases can influence 
the judgments. Furthermore, there appear to be some empirical constructions where the most minimal 
attachment is excluded (by competence constraints) permitting a less minimal attachment. These (rare) cases 
constitute yet another class of exceptions, at least in our framework?! It is a heuristic to be applicd when 
there arc no reliable clues (semantics, pragmatics, or grammatical constraints). Minimal attachment is not like 
center-embedding, for example, which is universally unacceptable. ‘Center-emnbedding ‘can be explained by 
the FS hypothesis; we are not likely to find a similar explanation for minimal attachment. It is a "least effort" 
heuristic (in linguistic terminology, it is a "markedness” principle). “Heuristics are generally more difficult to 


explain than universals like center-embedding. 


91. Actually Frazier (personal communication) disputes this point. . Since her machine is non-detcrministic, these 
“exceptional” cases are less problematic: her machine simip'v takes the mest minimal path first and Uren backs up when it 
encounters a dead-end. Hence it wilt eventually find: dhe most. mintaial grammatical interpretation. In our deterministic 
framework, we have marked niles.to louk abead for the problematic cuses., bn either framework, though, these exceptional 
cases pose a difficulty for an explaination because it is net clear how ane can constrain the backup/luokahead mechanism. 
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4.1.3 Left Branching Structures 


There are some cases where the heuristic is crucial. For example, extreme non-minimal attachment 
(predicting before attaching) fails on a left branching structure such as (157), where it would predict infinitely 
many noun phrases.>2 Although the most extreme nan-minimal Position is theoretically inadequate, there are 
many compromise positions which may suffice. For example, a parser could make a few predictions before 
attaching, thus creating slightly non-minimal' structures without the theoretical inadequacics. ‘There is no 


explanation for the most minimal strategy. 


(157) np -> np’s n 
(158) John's father’s ... brother's friend 


4.2. Garden Paths 


Left branching is an extreme case, Frazier’s experiments were more concerned with the well-known garden 
path (GP) phenomena such as (159}-¢462).. These are called-GP sentences because the reader is led down the 
garden path so to speak. It would appear that the performance model has aptimized the process of 
recognizing the vast majority of sentences which do not.contain garden paths so that these GP sentences are 
no longer acceptable. 


(159) #'The horse raced past the barn fell. 

(160) #The ship floated on the water sank. 

(161) #John lifted a hundred pound bags. 

(162) #1 told the boy the dog bit Suc would hetp him. 


92. Some parsing models in the literature actually have this problem. For example, the LU(k) algorithm, which predicts 
before attaching. will infinitely predict on left brinching structures. Also. the Harvard Predictive Analyzer (HPA) 
[Kuno66] ran info difficulties because it predicted firs. They inverted the super heuristic.to prevent the muchine from 
predicting more terminals than there were input symbols, Necdless to sty. it is possible to do much better by attaching 
sooner as in Farley's Algorithm [Farley70]. “A well-formed substring (WFSS) table [Kuno ind-Octtinger63] would. also 
salve the problem, though it requires unbounded space. (it could be argued that- the WFSS a the necessary 
bottom-up information by constraining the search space as it doos.) 
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The GP interpretations result from attaching at the critical point instead of predicting. For cxample, the 
machine will prefer to attach in (163), thus taking the first fatal step down the garden path. The grammatical 
(but unlikely) interpretation requires predicting a clause node instcad of attaching. 


(163) [, | told the boy the dog bit] 
[, the dog bit] 


fi 
[, would] 


[, help] 


The "non-minimal” interpretation can be forced in the presence of positive evidence.” For example, (165) is 
acceptable because: there is sufficient positive informatian:.(an unambiguous + et morphological feature) to 
predict a reduced relative clause™ when the machine is in state (166). On the other hand, semtence (164) does 
not have the same reliable evidence for a reduced relative, and hong? there is insufficicnt motivation to 
predict the additional node. Since the yp can’t attach to [np- the horsc] without the reduccd relative node, 
and the reduced relative nedc ¢an't be predicted. the machine will paiclase, | ‘the only ps-action left. In this 


2Rs 


93. In our formulation the positive information will be in the limited lookahead buffer: in Frazier's model, it will be 


discovered by the limited backup mechanism. 
94. The terminology, reduced relative, comes from an n old deletion analysis which, derived (b) from @ by deleting who 


was, 
(a) the horse who was taken past the barn 
(b) the horse taken past the barn 


This construction has also been calfed whiz deiction (short for who ty’ ‘eetion), Instcad of deleting, YAP base generates 
the constriction direetly as follows: lip ' the horse (\p- 4 taken ust | the barn}. tn this analysis, predicting the relative clause 


amounts to ‘pretticting the rp- node. 
95° If YAP louked sufficiently far atiead, it would find sufficient evidence for the reduced rekative. We: are assuming that 


one gerierally doesn’t luok that far ahead. 
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case, closing is the first fatal step down the garden path. 


(164) #The horse raced past the barn fell. 
(165) The horse taken past the barn fell. 


(166) [, ‘The horse] 
[np- The horse] 
=WALL== 
taken past the barn] 


vp 
fell} 


[ 
ly 
lpunct 1 

It would be possible to parse garden paths if one looked sufficiently far ahead. Figure 7 illustrates a very 
marked rule to do so. We assume that most people do nut {ook so far ahead because they have not seen 
cnough evidence to justify the cffurt. Perhaps, psyeholingwists; with ‘their unusual peers have acquired 
a rule like the “horse-racing” rulodn figure 7.77 


These garden paths should be distinguished from center-embedding because we believe no one (not even the 
best psycholinguist) can learn to parse deeply center-embedded sentences in real time. Although it would be 
possible to add a marked rule to parse garden paths, the machine is fundamentally incapable of parsing 


96. Frazier’s account differs slightly because she uses alternative phrase structure rules. 


np -> np vp (Frazier) 


[apt Inp2 the horse} [vp raced pust the barn] 


np- => np vp- (YAP) 

lnp- np the horsc] [ip [vp 
We have attributed the problem to predicting the reduced relative node (yp-): in her framework. the problem is to predict 
the np/. The accounts are very similar (modulo the phrase structure rules). tn both cases, ihe machine fails to predict the 
reduced relative because there is insufficient evidence. 
97. Similarly, it is possible to write marked production rules in YAP which violate eee grammatical constraints 
such as Ross’ Complex Noun Phrase Constraint (CNPC)... Although mast. nermal peyple have extreme difficulty. pursing 
violations of CNPC, there are some experienced linguisis who canpot trust their own intuitions. because they can. parse 
certain violations with relative ease. Since there are some people (c.g. “experienced linguists) who can. purse certain. 
Violations, a parser should also have this capability although it may require. sume very ‘highly marked. rules, : This. position: 
is somewhat different from [Marcus79], where it is assunicd that the parser should be incapable of violating | certain 
grammuatical constraints, 


raced pust the barn]]] 
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Fig. 7. A Marked Rule to Parse a GP 

If YAP had‘a nufe like the ad’hoe “horse-racing” rule below, it could paise, The horse raced past the barn fell. 
Of course, there is no evidence that such a rule exists. (This rule also has quite a number of other probicms 
which will not be discussed.) 


sentence: "he horse raced past the barn fell.” 
input pointer: 


[, ‘The horse] 

[np- The horse] 
==WALL== 

raced past the barn] 


(defrule horse-racing 
(pattcrn (=s =np-)(=vp =v)) 
(action (predict ‘vp-) (attach))) 


arbitrarily deep center-embedding. The allowable depth is determined by the limited memory. 


4.2.1 Semantic Bias 


There is some additional evidence distinguishing the GP case from the center-embedding case. Unlike the 
center-embedding case, it is possible to reverse the judgments with priming (167). or strong scmantic clucs 
[Crain and Coker78] [Crain79} (168)-(173).. Non-minimal attachments are generally, possible: if there is 


sufficient positive evidence (linguistic training, priming, or semantic clues) to. exclude. the.more minimal 


interpretations.” 


98. It is possible to add some marked sules which would occasionally allow an extra level of embedding. - 
Correspondingly, it is possible that a person could learn to recognize an extra level of embedding i in many situations. For 
example. certain experienced psycholinguists have niemorized certiin deep ‘constructions such as: & The woman the 
man the girl loved met died. However, it is impossible to. add. cnough, ararked rules so allow arbitrarily decp 
center-cmbedding. 

99. There is one qualification: the non-minimal uilachenents are limited to open nodes. Hence semantic biases cannot 
influence the attachment decisions once a node ins been closed. For example, in structures like: 
{I said [) you said he said ...X.... Y cannot attach to 5; once it is closed, under any semantic context. (Some semantic 


contexts might block s, from closing, and hence indirectly influence attachment decisions.) 
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(167) There were two horses being raced, one out in ake field and the other past the barn. The horse 
raced past the barn fell. Oar ashe te priming 


(168) ‘The tenant delivered junk mail threw it in the trash. semantic bias 
(169) #The postman delivered junk mail threw it in the trash. . 


(170) The cheater furnished the answers passed the test. 
(171) #The genius furnished the answers passed the test. 


(172) ‘The performer sent the flowers was thrilled. 
(173) # The florist sent the flowers was thrilled. 


4.2.2 Marcus’ Account 


This account differs slightly from [Marcus79], where it would be very difficult:to state a rule which correctly 
resolves garden paths, and consequently, his machine will guess which path tu take when it cannot correctly 
resolve the ambiguity. In the garden path case, the machine will take the wrong path. ‘The semantic priming 
can be explained in the model as reversing the heuristics, Accordingly, we would Predict that (174) should be 
out since the priming has reversed the two paths. ‘The prediction i is probably correct. 


(174) ?# There were two horses being raced, one out in the ficld and the other past the barn. The horse 
raced past the barn. 


It is more difficult for Marcus to explain why trained psycholinguists can parse garden paths. Unlike the 
priming case, the psycholinguist is aware of both paths. IF the disambiguating fule cannot be stated, then how 
is it that psycholingtists seem to parse both of thent correctly? 4U #8 possible that learning psycholinguistics | 
increases the lookahead buffer, and hence, they can parse certain GPs even though most normal people 
cannot. However, we have adopted another account. Instead of saying that the GP cannot be resolved by a 

marked rule, we take the much weaker position that there must dalnaee evidence to jus the rule. 
YAP, it was found necessary to enlarge. the class of definable, ies and once, we ite to aboition Marcus’ 
position that the “horse-racing” rule (figure 7) cannot be stated; ih favor of the weaker position that such a 
rulc is highly marked. 
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4.2.3 Related Work 


“f 


‘This account is somewhat similar to [Bever7@}: where there wasia parsing strategy (175) to account for some of 
the same empirical facts. We have two sight ubjections wish his strategys (a) it-is not .as:gencral as Frazicr’s 
_ formulation, and (b) it conflates performance and competence. 


(175) Strategy B: ‘The first N..V.(N)., clause (isolated by Strategy A [which segments clauses}) is the 
main clause unless the verb is marked as subordinate. 


Frazicr’s minimal attachment also overlaps with [Chomsky and [asnik77] where somc of the same 
phenomena are described in terms of filters. Frazier’s account involves performance whereas filters 
presumably cncade competence. Chomsky. and Lasnik. say. that (176) is, ungrammatical; Frazicr’s principle 


would imply that it is also unacceptable. 


(176) *#'The girl saw you is here. 


It is very tempting to suggest an explanation. A functionalist might argue that it is ungrammatical because it is 
unacceptable. 100 It is equally mistaken to deduce that uae ety follows from ungrammaticality. A 
mere overlap between performance and competence does not constitute an ‘explanation (in cither 
direction).!°! On the other hand, the overlap is probably worth studying in more detail. | For example, one 
might look for an explanation in terms of evolution as in [Bever and Lec It is unlike to be pure 


chance. 


100. A functionalist argues that a phenomenon P is the way it is because P is a necéskaty by-product of computing some 
function. In this case, a functionalist might conclude that minimal atlabaseoloegpheias eertain ungrammaticality facts 
because certain ill-formed sentences cannot be parsed. This position is taken in [Ades79]. 

101. Chomsky and Lasnik specifically warn us about certain tempting although incorrect functional “explanations.” 
According to [Chomsky and Lasnik77 pp. 437]. Similar conclusions are conventional in ultenipls at functional explunations 
for properties of physical organs, for example. Thus we can no doubt account for properties af Me: hedet by’ considering the 
Junction of pumping blood. but no one assumes that '~ embryo decides Je-develepia Aeurt bycause it would, be seful to have 
this function filled. (Most reasonable functional explanations are at the level of evolution. Even if functionalism does not 
provide an explanation, it ts oflen uscful as a motivating force. [Ht may suggest where to concentrate the investigation. 
Although we are not advocating an extreme functional position, it can be a profitable approach.) 
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Similarly YAP, which encodes minimal attachment, docs not explain minimal attachment or any facts which 
follow from that-(e.g. certain GP. phenomena) but arcrely! prevides a description. We agree with Frazier’s 
intuition that minimal attachment is'a consegeence of limited geseurces. Even if the connection between 
minimal attachment and limited resources could: be proven, we-would not have an explanation. It. would 
remain to be seen why people adopt the proposed strategy in favor of some inferior onc. [fs minimal 
attachment learned or is it innate as Frazier suggests? “These ’ire extremely hard questions: we have only 
attempted to model (describe) the facts. ‘This work should not be interpreted as an explanation. 


43 Non-Minimal Attachment 


There are a few exceptional cases where the default order (attach, predict, and then close) would produce 
incorrect results. These exceptional cases should also bea problem for 'Frazicr's principtes (which she solves 
with a backup mechanism.) In our framework, there will be a few marked rules to cover the following 


exceptions: 


(177) carly closure (chapter 2) 

(178) transformations (chapters 6-9) 
(179) non-minimal attachment 
(180) pscudo-attachment 


Sentences (181)-(186) show that non-minimal attachment is occasionally appropriate. The first sentence in 
each group is more minimal than the others. It would appear that the parser should not blindly attach 


without looking ahead at the next constituent for one of these exceptional cascs. 


(181) E know [the boy}. 
I know [[the boy] went home]. null complementizer 


(182) John saw Tom and [Mary]. 
John saw Tom and {{Mary} saw Sue]. : ; conjunction 


(183) I told the boy [that]. 
4 told the boy [that] story}. 
I told the boy {[that] you liked the story]. lexical ambiguity 
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YAP has marked rules to cover cach of these cases. The last group are disambigyated. by the that-diag, a 
marked rule to distinguished the various senses of that,'©2 ‘Yhe first two paim are disambiguated by a marked 
rule which predicts an s when there is a node looking top-down for an s and‘there is an subject-tense pattern 


in the lower buffer. For example, an s would be predicted in (184). 


(184) [, 1] a, ae 
: knew] know-l -> head . {obj, scomp}' 


ln home] 


All of these examples appear to be counter-cxamplcs to. Frazies's' minimal attachment which arc casity solved 
though a bounded Jookahcad/backup/paralicl. mechasismi. :Vhere afe some more difficuit examples 
(involving lexical preferences) which appear te support the/are-ordcring hypothcsis.: Sentences (185)-( 186) are 
a typical minimal pair illustrating the difference between see and know, which cannot be distinguished in 
purely structural terms. Although we have not implemented a scons we sec no reason ey these facts 


favor backup over lookahead (or parallclism). 103 


(185) | saw [, the horse raced past the barn]. 
(186) I knew [np the horse raced past the barn]. 


4.4 Pseudo-attachment 


‘There are structurally ambiguous: sentences, violating: any. well ordered set-of principics; these should be 
recognized as ambiguous (or perhaps, vague). ‘These present’a probica for both Frazier's principics and our 
ordering heuristic. YAP detects the ambiguity with a marked rule. Frazicr’s.two principles secm to conflict in 
this case. In the sentences below, minimal attachment would ah the pp igh and. late closure would scem 
to attach it low. ee er 


102. Martin (personal conmmunication), has informed us that certain scnses of that were ‘more yniform in. older forms of. 
English. [tis quite possible that we are missing a generat -ition in the various lexical farms of that, 

103. In a parallel model, one could imagine that unusual lexical entries; would, Gake dopger to. fetch from memory, and 
hence, an unusual sense woyld tose the “race”. In a lookahead sysigm, it is Peameibte: to.stale the marked rules so ce will 
trigger very rarely (only in the marked case). ‘ or os 
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(187) Sam hit the girt with a book. 
(188) Sam hit [the girl} with a book. : : ~~ high attachment 
(189) Sam hit,{the girl with a. book}. abe hig Hee, dee eae low attachment 


‘There are several possible ways to deal with this apparent conflict. 


(190) Define one of the two principles to avoid the problem. A . _(Frazicr’s solution)! 
(191) Cope with the possibility of conflict. a) ‘(the “double finish” account) !® 
(192) Add an additional rule to cover the conflicting cases. (YAP"s approach) 


YAP has a marked rule to pscudo-attach (attach both ways)! when it secs both alternatives and decides that 
it cannot decide which is correct. ‘This approach is completely consistent with Marcus’ determinism 
hypothesis, YAP makes a single kf: to right: pass over,.the input.stream. without backup. Once it 
pscudo-attaches, it will. not retract the decision at.a Jager date; In:this way, Mascus’ determinism hypothesis 
allows ambiguity, even though a deterministic. PDA cxciudes ; aaa i ‘Fhe following scntenccs illustrate 
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(193) Put the block in the box on the table. 
(194) He carried nothing to indicate that he was onc of the group. 

(195) We sighted the man with the binoculars. 

(196) We never fought a bull with real courage. 

(197) He hit the man with the stick. 

(198) He seemed nice to her. 

‘The cstructure representation of these sentences in not a tree but rather a directed acyclic graph (IDAG).!98 

For: example, lop- to her] in (198) would: have two, mothers: he participial-phrasc {yp- seemed ...] and the - 
adjectival phrase lap- nice ..,j. The multiple mothers sheuld-be intespreted as ¢xclusive possibilities. This is a 


104. Frazier [Frazicr79 pp.143] argues that het late closiire principle does not apply here because the girl with a book is a 
single package. As she defines late closure. it works on packages which are roughly six words long. 

105. Suppose that the parser consisted of several parallel processes which were all competing against each other. The 
first process to finish would be the "winner" and its output would be taken-as the preferred interpretation. When two 
processes finish at the same lime, the sentence might be considered umbiguous/vague, 

106. This idea’ was’ first ‘stiggested by Mitch Marcus. Tt is ‘ simitar to Saget “tne ‘Grishman’s notion of permanent 
predictable ambiguities. [Grishman73] > 

107. “Many of these’ sentences are from {Wales and Toncr76} 

108. A DAG is a general graph (of nodes and refations) with @ condition excluding cigcular loops. Alternatively, a DAG 
is a gencralization of tree where daughters may have multiple mothers. 
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convenient way to represent certain common structural ambiguitics that occur in natural language. The 
: a Ce re ae 


cstructure of (198) would have the following representation: 


(199) |, He fy seemed [,), nice PP;] PP, 
where PP; = 


‘There are three interesting cases of pscudo-attachment illustrated by (200)-(202). {n all three cases, downl 
can attach to cither up! or up2. (See figure 8.) In (200), up2 optionally selects another daughter, whereas in 
(201) and (202), up2 obligatorily requires another daughter. In (201). unlike (202), there is another 
constituent, so pscudo-attachment is possible. There is a marked rule which considers she three possibilities. 


(200) He lup2 seems lupl nice laswi 1 to her... pseudo-attach 
(201) [yp2 Put [ups the block [goynt in the box on the table ... pseudo-attach 
(202) lup2 Put lupl the block [gown] in the box. don't pseudo-atiach 


Pscudo-attachment is not limited to just prepositional phases; the YAP implementation generalizes the 
sectnlaue to work for any kind of xp- (pp-. ap-, or vp-), not iv PP. Conisider the following examples: 


rs Tint 
2 Rs ths 4 


Fig. 8. Pseudo-attachment 


sentence: He seems nicc to her. 
input pointer: 


[, he seems nice] 

(vp seems nice] 

a ap nice] 
=WALL= = 

is to key 

lop to her] 


[punct 1 


The marked pscudo-attachment rule attaches down! to both upl mee YAP knows. that it cannot 
ae between the two-possibi lities, 
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(203) Put the block [pp in the box on the table, 
(204) I considered every candidate lap- likely to win. 


(205) He carricd nothing [vp- to indicate that he was one of the group, _ 


YAP uses a very similar technique to process certain well-known cases of ambiguous wh-movement!™ such as 
(206). These will be discussed when we consider wh-movement in chapter 8. (206) has two interpretations: 
(207) and (208). Both of these are represented within a single structure (209) where the trace NP-; has two 


mothers. 


(206) Who(m) do you want to see? 
( 207) Who(m); do you' want to sce “? 
(208) Who(m); do you want i, to see? 


(209) Who do you want NP-; to sce NP-;? 


where NP-; = lnp-! 


binding. The basic idea is to | avoid ‘making aria, decisions uni, there is enough information. ‘This 
approach can be contrasted with an are ordering technique (such as [Kaplan72)). In Kaplan’s scheme, the 
possible decisions are ordered so the most plausible decisions are made first. In a delayed binding scheme, 


the system tries to avoid discriminating between possibilities as long as possible. In some cases, the system 


may never really distinguish between certain possibilitics.!!9 ~ 


109. Wh-movement refers to a class of constructions including relative clauses and wh-questions. These constructions | 
relate a wh-word with a gap which is represented by a 7 (for trace). Traces are represented in YAP as. phrases which 
dominate no words. 


relative clause: [saw a boy who; you know ti. 
wh-question: Who, did you sec t;? 


This will be discussed inmore detail in- chapter 8. ; ve 
110. (Vant-chn78] observed that informants sometimes claim they cinder a somenace with multiple: sguuiailiets until. 
they are asked questions regarding quantifier scope. The subjects will often admit they hadn't considered the scope issue. 
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‘There are limitations to the particular implementation of delayed binding in YAP. It may be impossible to 
encode all grammatical ambiguous interpretations. We claim that pscudo-aftachment can work for acceptabic 
interpretations; the other grammatical interpretations are unacceptable. Unfortunately, it is very hard to test 


this claim. 


It appears that pscudo-attachment cannot represent all CF interpretations because the device docs not have 
CF generative capacity. One could view. pscudo-atlachment as annotating: one of the attachments (the 
canonical attachment) with several alternatives. ‘The weak generative capacity will be the same as the 
canonical structure; pscudo-attachment does not affect the weak gencrative capacity, only the strong 
capacity.!!! Assuming that YAP is equivalent to a deterministic PIDA,"!? it has the weak generative capacity 
of a deterministic language (i.c. [.R(k)). Since L.R(k) languages do not include all CF languages, there are 
some CF languages which cannot be described using pscudo-attachment.!!? We claim that acceptable 


sentences can be described with pscudo-attachment.!!4 


Pscudo-attachments should not be undone at a later date. ‘There are certain probicmatic cascs where the 
simple scheme described above will run into trouble. There are several possible replies. Some of the 
interpretations are probably unacecptabte: Perhaps the rest could be processed with more lookahead. There 
are some problems with pscudo-attachment: nevertheless it is an interesting alternative to purely, 


non-dcterministic strategies. 


11]. The weak gencrative capacity is the set of sentences generated by a particular grammar. The strong capacity is the 
set of derivations. In gencral, the strong capacity is much larger since an ambiguous sentence corresponds to several 
clements in the strotig generative capacity, bul only one in the weak generative capacity. (Since the class of the machine 
(FS, CF. CS. TM) is tied to the weak generative capacity, pscudo-attachment can be implemented without moving to a 
higher computational class.) 

112. {tis conjectured that YAP would be a deterministic PDA if the stack bound were removed. 

113. For example, there is no LR(k) grammar for an iaherent/y ambiguous language. 

114. This assumes that acceptable sentences form an ER(k) language. Even stronger, this result should follow from 
Marcus’ Determinism Hypothesis and not from our FS hypothesis. (It trivially follows from the FS hypothesis since all 
FS languages are also 1.R(k).) Otherwise, it isn’t clear how ambiguous parses could be found short of exploding the state 
space as suggested in chapter } when Marcus’ Hypothesis was first mentioned. “ti other words, we are assuming that 
acceptable sentences (even with arbitrary center-embedding) are still weakly cquivalent.to an LR(k) language. 
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(210) Put the block in the box on the table PP* into the basket. 
(211) 1 consider every candidate likely to seem XP-* eon" 


Sentences (210)-(211) illustrate a aioe with vtsde attachment; the final constituent, which is arbitrarily 
far from the decision point, selects the higher attachment as in (212)-(213). But without the final underlined 
constituent, the cxamples arc highly ambiguous as (214)-(215) itkistrate. Phe probteni is that YAP has to look 
at the final constituent before it can determine whcther ‘or nit:to peed ue “The final constituent might 


be arbitrarily far away. 


(212) Put [the block in the box on the table PP*] [into the basket... - - unambiguous 
(213) I consider [every candidate likely to seem XP*] corrupt. 


(214) Put {the block] {in the box on the table PP*]: . highly ambiguous 
Put [the block in the box on the table] PP*. aa 

(215) l consider [every candidate] [likely to seem XP*]. 
I consider [every candidate likely to seem: XP*} 


We will make a simplifying assumption that the intermediate ee all aad the same ways. Only the first 
and last few phrases i in a sequence (XP*) can be pscudo-attached, itis assumed that the intermediate phrases 
all attach the same way. Consequently, the pscudo-attachment decision depends on just a few phrases (downl 


and down3 of (216)), not on an unbounded number. 


(216) [, put the block} 
\vp- put the block] 
lnp- the block] 


==WALL== ; 

[pp- in the box] the first xp- 
pp* . the middle xp-* 
[pp- into the basket} | _ _ the last xp- 


115. This example was suggested by Joun Bresnan. . 
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itl 


example, there are an unbounded number of grammatical ingemacatiod: “this nicchanisas on somites a 


bounded number:!! 


(217) | put {the block pp*] pp 
(218) I put [the btock pp] pp* 
(219) I put [the block] pp* 


We claim that the others are unacceptable (in the absence of positive evidence such as semantic bias). ‘There 


could be marked rules to consider semantic or pragmatic clues. 
4.5 Summary 


‘The cstructure implementation has been outlined. Unless there is an applicable marked rule, the interpreter 
runs the phrase structure rules in an unmarked order. ‘The unmarked order was chosen to be compatible with 
Frazicr’s two principles: late closure and minimal attachment. We have disgussed several classes of marked: 
exceptions (220)-(223). ‘Ihe description would be more autractive if the role as these marked exceptions could 
be minimized. This is an arca for future research. . , 


(220) carly closure (the A-over-A closure principle) 
(221) transformations _ 

(222) non-minimal attachment 

(223) pscudo-attachment 


In the next chapter we will show how fétructure can be built from estricture without violating memory and 
backup limitations. 


116. There would be one other interpretation if pur didn't Subcategorize | fur an obligatory second object: 
I stwf{the block pp*}. 


Functional Structure Implementation -72- Section 5 


5. Functional Structure Implementation 


The previous chapter sketched out YAP’s basic machinery for constructing the constituent structure 
(cstructurc), based solely upon category (n, v, np, vp, §, ..) information. The estructure is an intermediate 
representation toward obtaining the predicate/argument relations (fstructure). Computing the fstructure 
involves a number of syntactic features (propertics). It is casy to find minimal pairs such as (224)-(229) 


illustrating the necessity of certain syntactic features. 


(224) ‘That ball is round. . nuniber 
(225) That balls are round is a fact. 


(226) Have we eaten? case 
(227) Have us eaten! 


(228) Have the boys take the exam! . tense 
(229) Have the boys taken the exam? ; 


Each node (phrase) has a number of syntactic features (eg. person, number, gender, cas¢, tense and mood) 
and a number of grammatical roles (cg. subject, object, etc.) ‘This chapter will outline a procedure for 
assigning features and roles. ‘The problem is interesting because feature dependencies can cross seemingly 
unbounded distances. Nevertheless YAP has a procedure for manipulating features that doesn’t violate the 
severe resource limitations (memory and backup). The feature manipulation problem is similar the 
inheritance problem [Fahiman77] [Martin79, 80]. which is known to be very hard. Fortunately, the 
Bresnan- Kaplan linguistic theory provides us with just the necessary simplifyiag constraints. 


Many parsers compile the feature information into the parts of spcech (category), conflating constituent 
information (n, v,...) with functional information. Perhaps the most extreme example is the Harvard 
Predictive Analyzer (HPA) [Kuno66] which used about 180 parts of speech to distinguish everything from 
number to subcategorization frames. We accept the proposal that the two structures should be 
independent.!!7 In addition to her linguistic motivations, there are some computational advantages for 


dividing the problem in this way. It is often uscful to delay certain decisions as long as possible. ‘The HPA, 


117. The independence property is central to the Bresnan-Kaplan framework though it has appeared in carlier models. 
inchuding ATNs. 
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with its 180 parts of specch, couldn't separate. the distinctions which require immediate resolution from the 
ones that should be delayed. Consequently, it found many more ambiguitics than most people consider. For 
example, the HPA finds three interpretations of (230) where-most peopic notice only two, if that many. Some 
of these distinctions should be delayed (perhaps indefinitely): The multipfe interpretations of flying planes 


are far more striking than the possible senses of are. 


(230) They are flying planes. 


(231) ‘Mey areay [vp flying planes} 
(232) ‘Ihey ACoonula Inp flying planes] 
(233) They are flying planes] 


copula (vp 


YAP, as opposed to HPA, carries along multiple functional possibilitics until there is some reliable 
information to resolve the various alternatives. In this way, YAP can manipulate feature dependencies over 


unbounded distances without violating Marcus’ Determinism fypothesis. 
5.1 Seemingly Unbounded Dependencies 


We will illustrate a typical “unbounded” dependency in the. features between two nodes and then show how 
the dependency can be captured with only finite memory. ‘The method is in fact fairly general since it is based 


on the Bresnan-Kaplan linguistic theory. 


(234) There is a problem. 
(235) There are problems. 


(236) *There are a problem. 
(237) *There is problems. 


There-insertion sentences such as (234)-(237) have two dependencies: 


(238) subject-verb!!8 agreement 
(239) there agrees with its object 


118. Grammatical poles (subject. object, predicate, ctc.) will be undefined for the time being. The intuitive notions 
should suffice for the current discussion. 
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These dependencies can cross an unbounded ‘amount of material as the following sentences illustrate: 


(240) There seems likely to seem likely to seem likely-... 40 be a problem... -. 
(241) There seem likely to scem likely to seem likely ... to. be problems, . 


(242) *There seem likely to seem likely to seem likely ... to be a problem. 
(243) *There seems likely to seem likely to seem likely ... to be problems, 


In these raising! 19 sentences, cach embedded phrase takes an understood subject. ‘The dependencics can now 
be stated locally, although they have unbounded consequences. ‘That i is, the highest subject agrees with the 
tensed verb and the most decply embedded subject agrees with the object, Furthermore, all the understuod 
subjects are related, so they inherit cach other's constraints. Much of this chapter is concerned with the 


inheritance mechanism. 


(244) There, scems x, likely x¢ to seem ...%, tobe a problem. 


We will use a variable x as a place marker to represent the understood subjects. Now. the two dependencies 
are local; there, agrecs with seems and x, agrees with a problem. Since the subjects are related, the procedure 
has unbounded conscquenccs. Nevertheless the procedure docs not require inordinate resources. 


5.1.1 Grammatical Roles 


The notion of subject is crucial to this formulation. The Bresnan-Kaplan analyses use a number -of 
grammatical roles including subject, object, obj2 (second object), xcomp (adjectival, yerbal, or prepasitional 
complement), scomp (sentential complement) and predicate. Grammatical roles are aesence by structural 


and lexical constraints. For now, we will give an example to illustrate the intuitive notions: 


(245) subj I saw a boy. 

(246) obj I saw a boy. 

(247) obj2 I gave a boy a ball 

(248) xcomp He seemed likely to be nice. 
He sceined ty be nice. 
I gave a ball to a boy. 


119. Raising is « particular linguistic construction which has received considerable attention in the linguistic literature 
(sec [Postal74] for a long list of references). 
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(249) scomp It seemed that he was nice. 
(250) pred I saw it.!29 


‘These are all slots in the fstructure. Grammatical relations are extremely useful for describing many linguistic 


phenomena (sce {Bresnan8op. '4! 


5.2 Constraint Propagation Solation 


This feature manipulation procedure can be vicwed as a constraint propagation problem [McA llester80] 
[Mackworth77] [Waltz75]. ‘The problem is to propagate the agreement dependencies through the fstructure (a 
graph of grammatical roles). (See figure 9). Initially, all possible values are assigned; the number values are 
{singular, plural}.!22 Extrancous values are first weeded away by the Iexicon and then by agreement 
constraints, [In this way, multiple possibilitics are carried along until there is sufficient information to 
disambiguate. YAP docs not randomly try alternatives. (non-deterministic); heuristic guessing is avoided 


whenever possible, 


Figure 9 shows an fstructure after lexical specifications but before the constraint propagation. For example, 
the Icxicon specifics that a problem is singular ({singular}) and there is cither singular or plural ({singular, 
plural}). After propagating the two agreement constraints, X9. Xq and X¢ will al be singular (their number 
propertics will be {singular}). ‘The sentence, There seem ‘fikely to be problems, has a similar fStructure except 


X, X4, X and x7 are plural instcad of singular. 


120. In the Bresnan-Kaplan framework, pred is a feature. nol a grammatical role. We have placed it here because it is 
defined over a large set unlike the other features such as perso: ‘haniber and gender. 

121. Chomsky (personal communication) has criticized grammatical relations as an inadequate explanatory theory. 
Although it is possible to.deseribe the-facts. sarling from grammatical relations, 1 truly explanatory theory would have to 
derive grammatical relations themselves. Chomsky argues that deriving grammatical relations from structural notions is 
the hardest part and consequentlycthe notion isn’t very ase full it a Hingttstic theory,” This point is extremely controversial 
Nevertheless. explanatory adequacy is somewhat orthogonal to processing issues; for our purposes "mere" descriptive 
adequacy is sufficient. (Descriptive adequacy is no simple task.) 

122. We are assuming that features are defined over sniall sets of possible values. There are some theoretical difficulties 
associated with propagating grammitical roles since they have potentially unbounded ranges. The actual implementation 
has a special symbol (*undefined*) for the universal set of grammiatical roles. 
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Fig. 9. Constraint Propagation 
There seems likely to be a problem. 
‘There, ae X4 ee X6 to ee a itis 


The fstr ucture graph (before propagating the agreement cae aia is given ‘bekaws (omitting certain details); 
The two agreement constraints are subject-verb agreement (x5 with x1) and there-insertion (x6 with x7). The 
constraints are sufficient to uniquely determine the number features ({singylar}).: : 


before after 
x 1 pred: seems . 
tns: {prés} 
subj: x5 
_XCOMP: X3 


x9 form: there 
num: {singular, plural}: ‘ {singular} 


x3 pred: likely 
subj: x4 
XCOMP: Kg 


X4 is-bound-to: x5 — He 
num: {singular, plural} _ {singular}. 
X5 pred: there-be 
subj: X¢ 
obj: x7 


X6 is-bound-to: x4 
num: {singular, plural} {singular} 


x7 pred: a-problem 
num: {singular} _ {singular} 


The two constraints are subject-verb agreement and. there-insertian.- In; this framework, subject-verb 
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agreement is enforced by intersccting the agreement features of a tensed node with its subject.!23 In figure 9, 
the number features of the tense node X1 are intersected with its subject X», making X4's number {singular}. 
Similarly, there-insertion constrains x¢ to agree with. x7, making, ae & nuraber featurc {singular}. By 
is-bound- to cdges, the agreement constraints propagate all the: way through dhe: graph, making all the number 


features {singular}. 


If the constraints were inconsistent, some slot would have no »pessible values, and the sentence should be 
ruled out. For example, the ungrammatical sentences, “There seem likely. to be a problem and *7here seems 
likely to be problems, are bad because their fstructures have no possible values (i. c. mt for the number slots; 
one agreement constraint weeds out the value singular and the other removes plural. ‘The ungrammatical 


sentences are functionally inconsistent. 


If the constraints underdeterming the solution, some slots s will have several | possible valucs, sad the sentence is 
considered vague (or perhaps ambiguous). 14 The number features in (251) and (252) are all {singular, 
plural} indicating a number ambiguity. {n (251) there may be one or more "deer"; in (252), there is an 
ambiguity between the inner and the outer interpretation. 5 Sentence (253) has underdetermined tense 
({pres, past}) since put is lexically ambiguous. The underdetermined cases illustrate that the evaluator can be 


so lazy it may never get around td making a decision. 


(251) The deer might be nice. 
(252) The family might be nice. 
(253) I put it down. yl Pah, 


123. Actually, tensed verbs dont have number features themselves, but nuther-assign aumber features-to their subjects. 
For example, seen assigns singular features to its subject, although it is not singular itself. This point is important in 
cxamples like That they seem to be nice ts a foe? where ae pene ehous ainiguter even theuigh its inain verb(seem) 
assigns plural features to its subject (they: eas ral di 

124. An ATN modef can distinguish between: agueniie snl 1 eonibhgietey wich it hes (we mechanisms: -underconstrained 

values (vague) and non-detenministic assignments Sane fi our ubeasi inte: we'don't have the second mectein sn 
and henee we cannot (currently) distiagertsh the two cases: 

125, Cotlections can be viewed as: iniay dndiv fea ents Ger, and hence: nh ‘or ey can n be viewed as a single 
conglomerate (outer), and hence singufar.' : 
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5.2.1 Representation Issues © 


Keciture values are represented in’ bit vectors!2® so that cach set (i.c..{singular, plural}) requires ‘a constant 
amount of memory (independent of its size.): “hat is, the set {singcilar} and the set {singular, plural} require 
the same amount of memory. Unlike most non-deterministic systems, the ambiguity does not consume 
additional resources (time or space); the number feature requires cxactly onc bit vector in any case. ‘These 
representation issues can have a fairly important impact on the overall performance of the system; it is often 
worthwhile to take advantage of the particular parallel construction of the machine at hand in order to avoid 


potentially expensive non- deterministic searching. 


The features in figure 10 have been wickseacd? Each possible wilue is represented by a single bit; 
= possible, 0 = impossible. For cxample, if the gen and dat bits are set, then the case is cither genitive or 
dative ({gen, dat) In this representation, iti is is panicilarly easy to me nodes: we simply intersect the two 


Fig. 10. Features 


feature possible values 

case gen dat nom acc 

gender mnf 

pnc sl s2s3 pl p2 p3 

def +- 

pro + — 

tns tnsless pres past +ing +en 

mood decl wh-q yes-no-q imperative exclamation subjunctive 


126. A bit vector is an array of binary vasiables, It is very similar ta.standard set of binary valucd features. We have 
chosen this representation for efficiency reasons: it requires minimum space angd-certain operations (store, fetch and 
merge) can be done in parallel because LISP. has operations (for. nes Jogtcal operations in parallel on a single 
machine word (32 of 36 bits depending. onthe. particular hardware)... 

127. Category (s,n,v....) is not implemented in this way because cakgoeyt feaunres afe not percolated through the 
fStructure like the uthers. For example, although there is good evidence thal aaour phrase inherits a number value from 
its determiner (this boy, these boys), it is much harder to argue that it inherits a category. value. Category is defined to be 
part of the cstructure. 
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bit vectors.!28 We are crucially depending on the fact that features range over a small finite set of possibilitics. 
5.2.2 No Disjunctive Constraints 


There is a crucial linguistic assumption that enables the constraint Propagation technique: to work: there are 
no disjunctive constraints. It would not be possible to enforce a rule for example that required the first 
daughter to agree with cither the second or third daughter. Disjunctive dependencies are known to be 
computationally difficult because they involve postulating several possible worlds which may have to be 


considered non- deterministicall y; fortunately they don’ Loften appear in natural language syntax. ea 


128. Person and number have been combined (pac = person/number code) because there ate ofien ‘disjunctive 
constraints between the two. For example, the, noun Block can. lake any. person valug.and uny number valuc, but the 
values are not independent (it cannot be s¥'= third person singular). This encoding trick is taken from Parsifal. Kaplan 
(personal communication) mentioned that his. APN: parser used the: sume tick.” (One could argue that ins and pne are 
somewhat analogous: there are some words which have either may features or pre features. but not both. For example, the 
lexically ambiguous word blocks is cither pres or 3, but not both, This idea has not been implcmented. ) 

129: Martin: “(personal communication): Knows of only one “gy ntactic constriction | which suggests disjunctive 
dependencies. . The pamtitive noun, phe Rind. of dogs might be cither singular-or phural.--l seems to inherit its: foaturcs 
from one oF the other of its parts (but not peeanly both). 


What kind of dogs are those? he 
What kind of dogs is the most popular? 


Perhaps kind is not {singular}, but rather it is vague ({singular, plural}) between the inner and outer plural. The 
following pairs illustrate similar ambiguities, 


The bellows are coming apart. 
The bellows is being repaired. 


The conmmitice are fighting among themselves. 
The committee is fighting the regulation. 


This approach avoids disjunctive constraints, which are, compitationally problematic. Anstcad of postulating a an arbitrary 
number of possible worlds, there is only one ‘possibly world which encuiles, the ambiguity (ic, {singular, pJural}), The 
system will not hypothesize which possibilty i is correct until there i is sufficient information, to be surc. In truly ambiguous 
sentences, the distinction will never be made. 


The deer might have done it. 
The fish shouldn't have. 


This is consistent with the wait and sec approach. (The set of pussibilitics (i.¢, {singular, plural}) are stored in bit vectors: 
the information associated with a set it independent of the number of pessibiliues.) 

Kaplan (personal communication) has Suggested that lexical ambiguily ind lexical redundancy rules are a very serious 
source of disjunctive constraints. His point is well wken, though progress. is begin made. Robert Milne is currently 
working on the fexical ambiguity problem [Milne78a, 78b, 79, 80). We will discuss: our own solution to optional 
transformations (and fexical redundancy) in chapters 6-7. 
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5.2.3 Bind* is an Fquivalence Relation = 


There is another uscful simplifying assumption: the is-bound-to relation? forms naturat cauivatence 
classes. 13) We will replace the relation with : reflexive, panels transitive closure: bind*. In figure 10, 
the cmbedded subjects ‘are all bound ‘to one another forming a ‘single ‘equivalence class (under bind), 
Equivalence classes can be represented very ‘efficiently: instead ‘of ' storing each ‘clement individually, it is 
possible to store them collectively asa class, often saving ‘considerable memory. ‘The equivalence relation 
‘Fepresentation contains fir less information ‘than an arbiiraty ‘elation, his is very important for YAP, since 
there may be a ‘bounded number of classes, even though there ; aré an unbourded n number of clements. 


The equivalence property is a stipulation. We cannot currently cxptain-why-it fits the empirical data as welt as 
it docs. ‘The. theory would,be more autractiye if this assumption did. wot haye tw be stipulated.'2? It may be 
possible to explain it in (erms of other indepetidently miotivaed, Hedin Nevertheless,” it seems to be 
consistent with the facts and it enables a computational eptiminatin.) 


pas 


YAP dogs not assign Features, to nodes individually but rather “0 aauivalence, classes, collectively. All the 
co-indcxed subjects in figure 10'would share a single bagof featutes.'™ "Phat is inp, xg anid in figute 10 are 


iO) gates bho ta adel 


represented collectively in the optimized fstructure (256) under x “In many parsers (including Marcus 
Sw ., 


Parsifal), cach embedded subject would be represented individually. 


sdmiise ft 


130. Our use of is-bound-to is very similar to transformational movements in Chdmshky's: POR: when ‘we bind two 
positions, he would move a constituent from onc position to the other. Te absa i 

131. This property is implicitly wspumed in 1 {Kaplan | and iat 

132.’ Theft are'some very interesting thivorttical igstics here. ‘Th esa Rap aplan, framework stipulates that binding i is an 
equivalence Telation: Chomsky tik lekes NO. onitiitmient either’ ae” hich’ “be i s belie? Suppose | there were no empirical 
evidence tv decide the matter. “(Convincing évidénce is very hard to come’by. j Oni vihe one hand, the equivalence relation 
is an additional stipulation and hence it is undesirable. But on the other hand. “the ‘equivalence relation requires less 
information to represent (than a more general relation) and hence it is to be preferred. Fhereds.a. curtain advantage in 
having a more restrictive theory. [tis not clear whether it is theoretically more desirable to have-dewer stipukwions or a 
more restrictive representation, 

133) Although a processing’ argument alone: is not adequate jlslificytion foi adopiing. ‘the proposed assumption 
(movement is equivalent to bind*), it should be ‘sufficient motivation. la stu he proposal i in greater detail. 

134. Chunisky (personal communication) has Proposed that case “might bet signed to catch I 


\ ndex (i.c. cach equivalence 
class), not 10 individual noun phrases. Wisi fact that turindexed noun frases. receive case actly | pnce. 1 

135. This is inefficient in both space’ ditd time. In Parsifal, for along it'ean take unbounded time ho trace the binding 
pointers back to the lexical subject. 


2 
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(254) There scems likely to be a problem. 
(255) Theres seems, x4 likely; X6.t0 besa problem. 


(256) x l pred: seems 
tns: {pres} 
subj: x5 
xcomp: X3 
: X9.%4.%—@ form: there 
num: {singular, plural} 
X3 pred: likely 
subj: x4 
XCOMP: X« 
Xs pred: there-be 
subj: X¢ 
obj: x7 
x7 pred: a-probiem 
_hum: {singular} 


Co-indexing is a unification procedure. Whenever two nodes are co-indexed, their features are merged 
(intersected) and placed in shared memory. Updating one node's featurcs would affect the other because 
their features are being shared. In this way, an.wnbounded-number of nodgs equid be affected with a single 
update, since they might all be sharing the same features. ‘This is how the "unbounded" dependency in (254) 
can be realized with only limited working memory. 


Although the dependency is “unbounded” in cstructure, it is bounded in fstructure, which uses the more 
efficient equivalence class representation. A grammatical role (i.c. subject) refers to an entire class (with 
potentially unbounded membership) such as {X>, X4 X}, not to an individual member. Consequently, it is 
possible for YAP to enforce these agreement constraints very efficiently in the fstructure since they mention 


only a bounded number of classes (grammatical roles).!36 


136. In the Bresnan-Kuplin: framework, agreement dependencies are not allowed to refercnce more than four 
grammatical roles in a single rule. 
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Another attractive computational property of equivalence relations is-asgeciativity: they can be constructed in 
any order. x could be unified with X4 and then with X6 or'the other way around: ‘The: fstracture will turn out 
the same whether constraints are propagated cyclically!37 (bottom to top), inverse cyclically (top to bottom), 
or inside out. ‘The results are invariant with the order of application. Invariance. is very, convenient; a parser 


can then enforce constraints in the most natural order (Ieft to right). 


Invariance does not follow from most definitions of movement because a lexical object cannot be moved until 
it has reached the source of the movement. Consequently it makes a difference whether movements are 
computed cyclically or not. Perhaps movement should be redefined w be Tescative: 138 Similarly, the ATN 
SENDR opcration (which manipulates feature registers) is non-associative. This too cotild be redefined. 
Actually, part of the motivation for defining the Bresnan-Kaplan merge operator was to rid the asymmetry of 


the A’TN SENDR [Kaplan (personal communication)}. 
5.3 The Bresnan-Kaplan Analysis of There-insertion 


We will compare our analysis with the Bresnan-Kaplan analysis; YAP was;designed ao Ahat it could casily 
incorporate many of their ideas. Consequently, we were able to borrdwiimany: analyscs, such as the 
formulation of there-insertion, saving us considerable time and peneray. boats are not interested in veinventing, 


all of linguistics. this thesis is mainly concerned with processing constraints. : 


The problem is to build a fstructire fia ni cadeSicitl ye solani the’ fétructare come'from the 
cstfucture (e.g. the subject is the first np undcr'tense) and the lexiton. ‘ASty pied strtetaral’ dependeiity relates’ 
a noun phrase in "subject position" (immediately dominated by & tensed clattse} With the fatincture stot: frubj. 


Similarly, there are lexical constraints indicating, for example, that problem i is {singular}. prolens: is {plural} 


git 


and deer is {singuiar, plural}. (257)-(288) link estructure Positions with | grammatical re roles, ithe remaining 
functional slots will be fi Ited i in by the lexicon. a . | 


aged fe dae * : Shia G 


Peg ass SEN : As OTE hess Se my 

137. [Freidin78} has observed that cyclicity is derivable from independently motivated assumptions. In this framework, 
the cyclic order generates the same results as any other order. We could interpret Freiden’s results to say that order is 
irrelevant; the facts that were once explained using ordering constraints are covered under more general binding 
conditions. fe devine, yee ed Sedans i 
138. A movement could for example Icave a sink behind to swallow up the lexical phrase when it finally does arrive. 

There could be a well:formedness condition, blocking final:struclyres. contajuing sap ied sinks; Sonata isan to a free 
indexing scheme [Koster78].) hy ae 
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§.3.1 Structural Constraints 


(257) up:s -> dl:np d2:vp 
dl = subj(up) 
d2' = up 


(258) up:vp -> dl:v (d2:np) (d3:xp-)}49 
di = up 
d2 = obj(up) 
d3 = xcomp(up) 


Examples (257)-(258) are a slightly modified form of Bresnan-Kaplan’s notation. It has been changed to more 
closely resemble YAP’s notation and to be casier to type. 19 Roth (257) and (258) show a ' phrase structure tule 
followed by a number of constraint equations. For example, (257) gives an expansion for s; it has two: 
daughters, the first is an np and the second is a vp. There are two constraint equations below'the ps rule which 
fill in functional slots by a unification (co-index) operation. For example, the first equation, d/ = subj(up), 
defines the np under s to be the subject,!4! by unifying the first daughter (an np) with the subj slot of up (an 5). 
After the two nodes have been unified, they share the same memory so that further constraigts on cither node 
will affect the other. Hence the unification operator is the bind* cquivalance relation; the classes are 


represented collectively in shared memory. 


The second constraint equation d2 = up unifies the head of a phrase with its mother. This follows from x-bar 
theory [Jackendoff77] {Chomsky70] where phrases are defined as a projection ofa head. F or example, a noun 
phrase, such.as the she boy; is a projection of its head noun. boy, Similarly, an sis. projection of its head, a vp. 
Again, from x-bar theory, it follows that all. features. percalate.up from the head. For cxample, the noun 
phrase the boy is singular because its head.is singular. Similarly, |, | saw his], has past: tensc because its head 
vp has past tense. Functionally, one cannot distinguish a mother from its head, and consequently, they are 


139. The pscudo-category. xp stands for onc of the following: ap-. ¥p- OF pp-. 
140. YAP uses more mnemonic names: names like di. d2.... da'are _Peplaccd wit _csubj, cobj, cxcomp, .... The letter ¢ 
indicates a cstructural relation, as opposed to an f for a functional role. Where we have used up and d, Bresnan and 
Kip: in would use up arrows and down’ arrows, respectively. Atso. instead of tiurmbeFing: the @atighters as we have, she 
writes the constraint cquations andertieath the appropiiaic diughter. Certait? constraint’ Gqiuations can be tinthststood as 
the unmarked case, so they need not be restated for cach ps nile. See (Rapti ind Brosnan 80: * aes 
141. Technically, the subject is the fstructure of the np under $fhot the’ ap ies" The subject docs not inchide the 
estructure of the np (category and surface ditughters). 
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represented as a single unified node in fstructure. 


§.3.2 Lexical Constraints 


The remaining constraints come from the lexicon. A lexical entry looks very similar to a phrase structure rule. 
It defines a functional frame (instead of a constituent frame) with constraint equations between slots. We 
have used the dummy variables a/,... an instead of di, ..., dn to distinglish: Fanktiotial: arguments: from 


constituent daughters. The following lexical entrics are relevant to the example at hand: sbi 


(259) scem -> al:{vp-, ap-}!*3 
__ al = xcomp(up) 
. subj(up) = = subj(al) 


(260) likely -> al:vp- Ab 
‘al=xcomptup). oi. 
subj(up) = subj(al) 


(261) there-be -> al:np- 


1 = obj(up)* 
hunsubfupy) = num(al} Re te hae? ie 
form(subj(up)) = there ua be oe gd eae 


§.3.3 Well-Formedness Conditions 


The furictional structure is completely constrained by the constraint ‘Sauations in the ps tulcs and’ the lexical 
entrics. 41'Hé ‘functional ‘structube ‘must ‘meet’ thrte = — scien ‘sohotvnce 
atid cc conistency. Bath lexical! etry efits a Fericthnil ee sete eT 


MI 


142. We will not discuss the internal strucjure c of noun phrases at, ‘this, lime, For, now, we will use the ad hot predicate 
apr roble mi la represent the structure of Lip § Fy problem]. 

143. Technically. lexical predicates are npt, allowed to refercncy, the estrugure. “Gusegon, and surface daughtcrs),, The 
Bresnan-Kapkin formukalion replaces our xcamp with a scomp (a. yp; complement). i aucomp. (3, ap complement), a poomp 
(a pp- complement) and a ncowip. tap comp cree) ad 

144. Subject-verb agreement. wis nub. described. There. is a, lexical, calry for ‘cegh form: ‘of the verh: cach asserting. 3 a 
different constraint cquation on the subject. For example, seems would haya a rule Ike: nung subiup) = {singular}. 
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(262) cach slot must be filled (completeness) 
(263) and only those slots may be filled (coherence) 
(264) and multiple assignments to a particular.slot must be consistent . 


Sentences failing to mect these conditions are ungrammatical as (265}-(267) illustrate. 


(265) *Vhere is. | | | incomplete 


(266) *It seems John to be a nice guy. incoherent 


(267) *There are a problem. . inconsistent 


[Kaplan and Bresnan80} give an. algorithm for instantiating Icxical-éntrics;: we will not review it here since 
they were not concerned with the same resource fimitations. 
5.4 Implementation of Functional Structure 


Examples (268) and (269) illustrate a typical phrase structure rule and a typical-lexical predicate.!45 


(268) YAP’s Notation Bresnan-Kaplan-like Notation 
(def-ps-rule finite-s s s -> dl:{s-, np-} d2:vp ps rule 
(csubj obl (s- np-) dl = subj(up) . 
(action (merge down (get-fsubj up)))) d2 = up 
‘(chead ob! (vp) 


(action (merge down up))) 


(def- pred seem-1 seem scem -> al:{vp-, ap-} lexical predicate 
(fxcomp obl (vp- ap-) al = xcomp{up) 
(action (subj-control up down)))) subj(up) = subj(al) 


YAP"s ps-rules and lexical predicates share similar syntax, (269) and (270). Both of them are CF rules with 
Bresnan-Kaplan constraint equations encoded into the nonterminals (i.c. <term>). A <term> is defined as 
(271) below. 


145. By convention, all functional slot names will begin with an £ whereas all constituent slot names will: being with a:c. 
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(269) (def-pred <predicate name> <stem> <term>*) oa fe Ang ene rule 


(270) (def-ps-rule <ps-rule name> <category> <term>*) ar i ps rule 
(271) (<role> <OBL igatory, OP Tional, or STAR> <passidle eateguries> (action <tisp coda>)) © jenn + 


Recall that YAP’s attach operation autumatically ‘advanees the det"? in a ps-rale pointer past 4 nonterminal: 
In addition to updating the ps-rule, advancing the “dot” also invokes the constraints associated with the 
nonterminal. ‘That is, when YAP attaches a daughter to a mother, the daughter i is given the <role) i in the 
mother’s frame, and secondly, the <action> field is evaluated with up and “hiwik bound | to the mother ad. 
daughter, respectively, !46 For example, when YAP attaches down] to up! in (272), down! hacamies. the csubj 
of up] because the. "dot" passes the csubj term.’ Fuethormore, down? :beeumes the rudy of up| bocause thd 


action field specify that down (bound to down/) be merged with-up{boundita up!), 8 
(272) [, ] finite-s -> . csubj chead So Befrnbiattaching 
fsubj: empty 
csitbj: enipty SAGE 28O : ee i a 
==WALL== 
lnp- 
[yam]. 
ldet det] 
(273) [, 1 finite-s -> csubj . chead cee "after attaching 
fsubj: (np- I] i - 
csubj: [np- y} 
I np H 
=WALL== 
[, am] 
Lact ded 


146, The action field could contain an arbitrary LISP expression to be evaluated during an attachment, although by 
convention, the action fields merely update functional roles and syntactic features immediatcly connecied to. nodes. in the. 
buffers. It is not allowed to violate the FS hypothe. (i sae a an este sea tu climinate the action slot by 
classifying to possible uctions) 2 fs (Eee ec ndima bite oe ns 
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The fstr ucture parallels the cstructure in many ways. Just as we prea a ps pointer with every nodc, we 
will associate a predicate pointer with every predicate. ‘When ‘a daughter is attached to a’ predicate,’ the 
predicate pointer is advanced very much like the ps pointer is advanced. Advancing the pointer over a term 
invokes the relevant constraint equations. For example, attaching’ d frcomp to scents, a$ in figuie 11, invokes 


subject-control. ‘That is, the daughter's understood subject is its mother's subject. 


Fig. 11. PS Attach (revisited) 


sentence: John scems to have Ieft. 


input pointer! 
{, John secms} finite-s -> esubj chead . 
seen-1 ->. fkcomp 
“foubj: f, np- - John}. 
[vp seems} normal-vp -> chead . (cobj) icconw) 
seem-] -> . fxcomp 
fsubj: [np- John] 
==WALL== 
[vp- to have Ieft] normal-vp- -> ccomp chead . 
have-1 -> fxcomp . 
fsubj: empty 
[punct A | normal-x -> cword . 


After attaching, up]'s ps and pred pointers will advance invoking the constraint cquations: down/ becomcs 
up!’s excomp and fxcomp, and downl’s fsubj is controlicd by up/. 


[, John seems] finite-s -> csubj chead . 
seem-1 -> fxacomp . 
fsubj: [np- John] 
seems] normal-vp -> chead (cobj) (cxcomp) . 
Seem-1->fxcomp.. tires 
fsubj: {yp- Jabal... Be HE Dbtes vag og depseys 
fxcomp: Lp - to have Ieft] teu Ete 
~ excomp: ie to have lefty ’ © | ae 
[\p- to have Icft] javaiekier =>-cpomp chead , , 
have-! -> freomp:,: 
fsubj: [ 


np- John] | from subject control 


==WALL== 


founct J normal-x -> cword . 
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For another example, there-insertion constraints are enforced when the fobj is attached, using the following 
lexical entry for the verb to be. When YAP attaches the fobj, it checks the ful: if it is the form there, YAP 
senforces number agreement, by merging the num feature of. the subject and object “7 This tule can have 
unbounded consequences since the fsubj can be passed down though a an arbitrary number of raising verbals 


(like seem and /ikely). 


(274) (def-pred be-1 be 
(fobj ob! (np-) Fag 
(action (if'48 (= *there (get-fsubj up)) (mergef (get-fsubj up) down num))))) 


Bresnan-Kaplan’s completeness, coherence, and consistency conditions are implemented using the predicate 
pointers. Completeness is a condition on closing: a node cont hee. until all of its obligatory roles have 
been attached. Coherence is a condition on attaching; a daughter cannot attach unless it is an ‘argument of? its 
mother (or controlled by an argument of its noe) 9 Conyetency, is. a, pondition on unification; 


inconsistent slots cannot be unified. 


x Rpg eee TP TAGS OF 


147. Note the difference between the mergef and merge functions. The firmer ‘merges a particular feature (say num) 
whereas the latter merges all features. An equation like up = down mae Wi Reap hes labotreas only the num feature is 
merged by an equation like mtum(up) = num(down). elo eT ae 

148. The fisp macro iis a simple conditional: it evaluates its secon argument ifthe | first argument returns true. 

149, Argument is a linguistic notion which distinguishes positions selecting Textcal items (John. Mary, the table. ...) from 
forms (there. it. idiom chunks....). The subject of seem is not Gb A SpusiGon  bocause it can lake forms. as in (a). 
Lexical items which appear in that position are not arguments of seem, but naaenofthe xcomp. For example, in (b) John 
‘is an argument of'nice, not seem. patsy 

(a) There seems to be a problem. , 
(b) John scems to be a nice guy. 
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5.5 An Example 


‘The cstructure and fstructure for (275) are listed below. ‘This example is. very similar to Appendix 2 which 
traces the derivation more carefully, - 


(275) ‘The boy was likely to sit? 


(276) CSUBS: [(NP-)'the boy] . . estructure 
CHEAD: [(NP) the boy] : ; 
CSPEC: [(DET) the] ° 
CHEAI: [(N) boy} 
CHEAD: [(VP) was likely to sit] 
CHEAD: {(V) was} 
CXCOMP: [(AP-) likely to sit] 
CHEAN: f{AP) likely to sit] 
CHEAD: [(A) likely] 
CXCOMP: [(VP-) to sit] 
CCOMP: [(COMP) to] 
CHEAD: [(VP) sit] 
CHEAD: [(V) sit] 


(277) FSUBS: [((NP-) the boy] oe ftructure 
FSPEC: [(DET) the] . 
FXCOMP: [(AP-) likely to sit] 
FSUBJ: [((NP-) the boy] 
FSPEC: [(DET) the] 
FXCOMP: [(VP-) to sit] 
FSUBJ: [(NP-) the boy] 
FSPEC: [(DET) the] 
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6. Lexical Transformations 


The traditional -arguinchts ‘for complex tadels (e.g. PG°and?N TNE suggest that ‘simpler ‘mechanisms (like 
YAP) cannot capture the full range of linguistic gencralizations. This chaptér Will addiess this tHiticism.!? 


(278) “It is well known (cf. [Chomsky64]) that the strict context-free grammar modet is not an adequate 
mechanisin for characterizing the subtictics of natural languages. Many of the, gepdjtions, which, 
must be satisfied by well-formed English sentences require some degrac. of agreement, between 
different parts of the sentence which may or may not be adjacent (indeed whigh nay ibe separated 
by a theorctically unbounded number of intervening words). Context-sensigjxg, grammays could 
take care of the weak gencration of many of these constructions, but only:ay ss, cose gf dosing the. 
linguistic significance of the ‘phrase structure’ assigned by the grammar Af, {Postal64}). 
Morcover, the unaided context-free grammar model is unable to show ,the systematic relationship 
that exists between a declarative sentence and its corresponding question form, soctween an active 
sentence and its passive, etc.” : Po 4 piste 4 


Big. rer ©, ‘ 
There has always been some controversy over these arguments; currently, Gazdar [Gazdar79a,b.c] leads the 
i 4 re wa s a ‘ 
opposition. The confusion stems from two very different interpretations of cumplexjty. ; 
(279) linguistic complexity: the size of the grammar itself _ 
(280) computational complexity: the time and space bounds for an ideal processor 


ach ft. 


In general, there is a trade-off between the two types of complexity; tic sie “of a’ progtam (timguistic 
complexity) is typically inverscly related to the power of the interpreter (computation complctity). Woods 
has adopted Chomsky’s view that (279) should be optimized at the expense of 236). Si Gaidar's s. Position is 


Reuse 


150. The following quotation is taken from [Woods70}. He is trying to justify augmenting his ATN model. An 
un-augmented ATN (a Recursive Transition Network RTN) has CF complexity. 

181. Chomsky (personal communications) has said on many occasions that weak generative capacity (computational 
complexity) is completely irrelevant to the study of grammar. However, weak constraints can be used to limit the space of 
possible grammars. For example, if language (weak) is actually FS, then no strictly CF grammar (strong) can correctly 
describe the facts. 
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just the reverse.'5 Bresnan and Kaplan claim that it is possible to optimize ‘Huth (to have your cake and cat it, 
so to speak). YAP was designed along these lines. It has very minimal computational complexity without 
sacrificing Ainguistic generalizations. “This chapter will show how YAP captures many linguistic 
gencralizations, greatly simplifying the grammar. 153" * Chapters 6-9 discus the following topics which are often 


used to "refute" a position like Gardar's 


(281) Lexical ‘Transformations (passive, raising, therc-inscrion, ...) 
(282) Local Structural Transformations (aux-inversion, deletions, ...) 
(283) Wh-movement -(whequestions, relative clauses, ...) 
(284) Conjunction - (vp deletion, sapping. a a) 


iT: his chapter will consider the following four constructions; other lexical rales are very similar. 


(285) raising 

(286) it-cxtraposition 
(287) passive. 

(288) reanalysis 


There is considerable controversy over these rules; we have adopted the lexicalist position which ' ‘compiles” 
the effect of these rules into the lexicon. That is, there are different lexical entries for see and seen; see is a 
transitive verb whereas seen is intransitive. Chonisky advocates a transformational position where passive and 
raising are subcases of move-np. Marcus has encoded Chomsky’s analysis in a deterministic framework. This 


chapter will discuss a formulation of Bresnan-Kaplan lexical rules in YAP’s framework. 


152. Itis widely believed that CF rules arc inherently inadequate (in principic) to describe the facts. Gazdar (and others) 
give very goud evidence to the contrary. It is theoretically possible to describe bette active and pissive scatences with two 
different CF rules. Similarly. it is possible to describe yes-no questions with yet another set of CF rules. Since there are 
only a finite number of transformations and only a finite number of base CF rules. one could apply all the transformations 
to the base. forming a large inelegant (but finite) set of CF rules which describe the facts. Gazdar's derivation could be 
viewed as a constructive “proof that grammar has only CF (computational) complexity. (There are some apparently CS 
constructions to be considered: "respectively" in English, wh-movement in Swedish, subject-verb agreement in Dutch 
verbs. and Postal’s Mohawk puzzle.) 

1$3. Gazdar’s system has meta-rules to achieve the same goals, though his sulution. tends to multiply the number of 
grammar rules by a rather substantial constant. Unfortunately, all known gencral CF parsing algorithms consume time 
Proportional to the size of the grammar, and hence. Guzdar's solation: witkstow: ne rates: time on a rather cabemiial 
constant. 
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6.1 The Lexical/Transformational Debate 


The last chapter demonstrated a lexical formulation of there-insertion (coupled with raising). ‘he understood 
subjects were related to cach other in the fstructure by lexical constraint equations. Chomsky would achieve a 
similar result by representing the understood subjects as traces (empty noun phrases) in the cstructure. 
Instead of using lexical constraint equations to bind the traces, he uses a syntactic transformation called 


move-Np. 


The differences between these two positions are very subtle. We will review one argument for cach side to 
illustrate the flavor of the debate. Neither of these arguments is definitive: there is a large literature of replies 
and counter-replics. ‘The arguments should demonstrate that competence issues (lexical versus 
transformational) are orthogonal to performance. ‘The state of performance modcls is not sufficiently 
sophisticated to distinguish subtle competence issues. [t is doubtful whether performance models can ever 
distinguish certain matters of competence. !*4 Both the lexical and transformational positions are internally 
consistent (for the most part) and equally parsable (Marcus used a transformational approach). We chose the 
lexical position for its very attractive representation of features (described in the last chapter). Although it 
may be possible to devise a similar scheme in a transformational framework, the lexical representation was 


available when YAP was being designed. The debate has concentrated on two points: 


(289) Do move-np rules (passive, there-insertion, raising, ctc.) Ieave a trace? 


John was seen. lexical 
John; was seen t. transformational 


(290) Do infinitives take lexical subjects? 


I believe Lnp- John] l\p- to be a nice guy] lexical 
] believe [ John to be a nice guy} transformational 


154. An extreme functionalist position might suggest that all competence issues are ullimatcly specified by processing 
considerations. This seems most unlikely. 
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The following two arguments debate point (289). 
6.1.1 The Wanna Argument 


The Wanna argument [Bresnan78] demonstrates that there-insertion "must" be a lexical rule since it does not 
leave a trace (an empty noun phrase in cstructure). In English, certain verbals (c.g. want, goitg) can 


optionally contract with the word fo as in (291) and (292). 


(291) I want to go home. 
I wanna go home. 


(292) I'm going to go home. 
I'm gonna go home. 


Want + to cannot contract over a trace. Hence contraction is blocked in (293)}by. the trace of wh-movement, 


but permitted in (294) where the trace does not intervene. 


(293) Who; do you want t; to see Bill? 
*Who do you wanna sec Bill? 


(294) Who; do you want to sce t? 
Who; do you wanna sce t;? 


‘The question is: does move-np leave a trace? Is therc-insestion a lexical rule as in (295) or a transformation as 
in (296)? If there-insertion leaves a trace, then contraction should be blocked as in wh-movement. But 


contraction is permitted, so therc-insertion "cannot" leave a trace. 


(295) There is going to be a movic about us. Peg ors a lexical 
(296) ‘There; is going t; to be a movie about us. “8 __ teansformational 


(297) There's gonna be a movie about us. 
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6.1.2 The Away Argument 


[Williams80} argues that the durative particle away occurs only with intransitive verbs. as demonstrated by 
(298)-(301). | 


(298) ‘The dial is spinning away. Sais a 
(299) *John is spinning the dial away. (wrong meaning). 
(300) John is hitting away at Bill. 

(301) *John is hitting Bill away. 


He then observes that away can occur with lexically derived intransitives (where there is no trace), but not 


with syntactically derived intransitives (where there is a trace). 


(302) John is cating away. lexically derived 
(303) *Whozis Bill hitting t; away. oe ils. . Syuetactically derived 


If passive is a lexical rule, then it should allow away by analogy with (302); if it is syntactic (leaving a trace), it 
should block away as in (303). In fact, away cannot occur with passives, sa qmoyg;pp “must” Ieave a trace. 


(304) *Bill; was being hit t; away by Fred. 


Neither position is conclusive. Having adopted the lexicalist position, we should show how. , jinguistic 
gencralizations can be encoded within the lexicalist framework. PARENTS the ee is ners to me 
procesing fimitations (Mite sé nd deertninis) AS a GRE ms ee 


6.2 Raising 


The fast chapter illustrated a lexical analysis of raising; we will sunwracize the analysis here.. There are two 
types of raising -ritles: raising-to-subjcct (305) and raising-to-obicet (986). If both eases, there is a. raising 
verbal in the higher matrix (c.g. seem, promise, likely, persuade) which determines the type of raising. In. the 
Seenl case (raising-to-subject), the embedded subject is bound to the higher subject: in the persuade case 
(raising-to-object), the embedded subject is bound to the higher object. Bresnan-Kaplan constraint equations 


elegantly capture both cases, !55 


155. The term ruising comes from the old analysis where transformations literally raised the embedded subject up to the 
higher matrix. See [Pustal74] for a defense of the traditional analysis. 
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(305) subj(up) = subj(xcomp(up)) raising- to-subject 


John seems to be a nice guy. 

John promised Mike to be a nice guy. 

John is likely to be a nice guy. 

John struck Mike as likely to be a nice guy. 


(306) obj(up) = subj(xcomp(up)) ... Raising-to-object 


John persuaded Mike to be a nice guy. 
_ John forced Mike to bea njge guy. 
_ John convinced Mike to be a nice guy. 


6,3 Auxiliaries 


YAP analyzes auxiliaries as raising-to-subject verbs; they all select a verbal fxcomp and subject control. 
Unlike raising verbs, auxiliaries select participial is! features whereas raising -verbuls- genceally sclect 


infinitival ss features. 


(307) was Ixcomp going]. : ae ae 4 - auxiliaries 
J have eomp Som). 
(308) I seem [xcomp to go]. raising 


Modals (can, may, will, ...) and do select insless complements, have takes + en, and be assigns either + ing or 
en.” For example, the predicate for be would look something like: 


156. The fs feature takes either tense or participle valucs (since the two have a le dial Se) The 
foeurr vilues are: pres, past, insless, +cen and +ing. 

157. Many analyses separate the two forms of be into an active and a passive entry. Our fuedbulsiion | is more consistent 
with the wait and see philosophy. We chim there is only one copula be which selects an xcomp marked with cither active 
or. passive inflection (4+: ing. + eg}). The wetive.and pussive: iawerprotasions ane: determined! by the donee s ga 
not by the copula. 
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(309) aux-be -> al: vp- Heakp 
al = xcomp(up) 
tns(al) = {+ing, +en} ; 
subj(al) = subj(up) pe hare ©, 


Auxiliaries can nest freely to form sentences like the following: - 


(310) I woutd Rave been taken. 
(311) | would have been taking the ball. 


bel a a” 


There are a few constraints which limit the possibilities. \{-dals and’ do tive fio participa forms (in their 
auxiliary senses)'58 so they must appear in positions requiri.:: present or ‘past ‘infection. In other words, they 
must be directly dominated by a tensed clause because that i the only tensed position. For example, (312) is 
ungrammatical because will does not have a tsless form which would normally be required after wodd: ($13) 


is out for similar reasons. 

are : Le 7 : . eebhr Lage Dos un 

'(312) *I would will have .. be yaa 

(313) *I would do have ... tet; oes Tees 


Even with these constraints, the raising analysis seriously over-generates. One could fix.this problem using a 
small sct of motivated features as in [Akamajian79]. Curr. ::tly, YAP will accept SEURERCCS, like G4). It is 
possible that these should be excluded on semantic or pra::satic grounds like (315)‘ which ave ‘sytitactically 


158. Certain modals are easily mistaken with main verb forms, whi. have very different morphology and distributions. 


"| should! can you for that: - oN fh See tise aes. laa teria 259 Shae 
_ Thad the boys take the exam. Lthig TG oe teed hy 
i coaed Hee be ao Se, Sip teen Ste 


ae cats 


It isa‘t clear how-a parser can duilagueh ile two forms, YAP has. pionnated rites Gemdeiaee F few caics, etc 
ambiguity is a very hard problem. 
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well-formed, though semantically questionable.!9 


(314) *I have been having been having ... 
*- have had had ... 


(315) 271t seemed.to seem to.seem ... 
It is likely to be likely ... 


Except for this problem, the raising analysis is extremely simple and efficient. See [Akmajian79] for a critical 


review of these proposals and some alternatives. 
6.4 It-extraposition 


The raising analysis has a number of manifestations; it has played a crucial role in there-insertion and 


auxiliarics. It also turns out to be impertarit in it-extraposition, Hlustrated by (316)-(318) below. 


(316) It was believed that | would go. 
(317) It was promised that [ would ge. 
(318) It seemed likely that | would go. 


It-extraposition is similar to there-insertion; both cases illustrate a dependency between asubject and a deeply 
embedded constituent. In there-insertion, the "dummy" form there depends upon a deeply embedded noun 
phrase such as a problem: in it-extraposition, the "dummy" if depends upon a deeply embedded clause. — 


159. We could suggest some more fillers to exclude some of the additional cases. For example. Have doesn’t take + ing 
in its auxiliary form. 


[have taken it. 
*1 was having taken it. 


A second condition blocks two adjacent verbs with + ing inflection. 


*[... ting +ing..] 
*T um being being .. 


These filters are merely descriptive; a true theory would explain. these facts. 


I t-extraposition - 98- Section 6.4 


(319) ‘There seemed likely to seem likely ... to be a problem. 
(320) It seemed likely to seem likely ... that | would go. 


YAP uses a similar mechanism in both cases: just as there are lexical entries which check their fSubj slot for 
the form there, there are lexical entries which check for if. Since subjoets:can ‘be raisdd. arbitrarily ‘far, 
it-extraposition can have unbounded consequences. ae eM 
(321) (def-pred be-1 be 
(fobj obl (np-) 
(action (if (= *there (get-fsubj up)) (mergef (gct-fsubj up) down num))))) 


(322) (def-pred likely-1 likely 
ooi(fscamp obl (s-) : be ei: 
(action (if (= *it (get- fsybj: iy acre tee Rabin noni i th ti 


The form it in (323) is co-indexed with the scomp (sentential anal fo. distinguish it from -the 
pronominal it in (324). The two interpretations have different semantics. wr 


(323) It seemed that we were nice. oo (meaningless i) 
324) It scomed tobe nice valet 5 So olacaeeti 


Simita comments apply to there (325-326) demonstrate the diferent semantics of there. 


(325) There was a problem. (meaningless there) 
(326) I went there. (pronomial there) 
6.5 Passive 


Our passive analysis depends on the formulation of auxiliarics as raising verbs. Passive participles do not 
stipulate the auxiliary. It happens that éo be is the only auxiliary that can take a passive participle. "6t This 


160. Note that it-extraposilion merges every feature associated with the subject whereas there-insertion only merges the 
num feature. Hence, it-extraposition uses the merge function whereas there-insertion uses the mergef function. 
161. Except for have, all other auxiliaries block ‘+ en paiticipes’ (they mergé'sdidie other tng feature wiih their ficomp,) 


For some unexplained reason. have blocks passive interpretation of its fxcomp. 


Passive -99- Section 6.5 


purely accidental; passive participles are found in many other constructions without the verb fo be.®2 ‘The 
verb to be is identical in both (327) and (328); the difference is restricted to the participial phrases seeing me 


and seen, 


(327) John was secing me. 
(328) John was seen. 


‘there are two lexical entries, one for seeing (329) and one for seen (330), which are related by a lexical 


redundancy rule to capture the pussive generalization. 


(329) active-see -> al:np- a2:np- 
al = subup) 
a2 = obj(up) 


(330) passive-see -> al:np- 
al = subj(up) 
tns(up) = +en 


In the Bresnan-Kaplan framework, all Iexical entries are "tried" non-deterministically; structures meeting the 
functional well-formedness conditions (coherence, completencss, and consistency) are considered valid 
interpretations. This is a perfectly reasonable competence medal; however, it may have two problems as a 


modet of performance: 


(331) very large lexicon 
(332) non-determinism 


162. Here are three constructions involving passive participles: 


a fallen leaf 
He scemed persuaded to leave. 
T saw a horse taken past the barn. 


There is a considerable literature discussing passive generalizations: our formulation is consistent with the lexical imiallyses, 
although many of the details have not been implemented. 
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YAP uses a virtual lexicon to alleviate problem (30) Instead of storing all the lexical entries literally in a 
. huge array, YAP stores only the core entries: other entries are gencrated upon demand. Vicwing the lexicon 
as a black box, it shouldn't be possibic to distinguish the real entrics from the virtual ones. ‘The virtual lexicon 
is very analogous to virtual memory systems which page address locations into real memory upon demand. 


These schemes take advantage of a space/time trade-off, 63 


Determinism is more difficult to arrange. How can YAP decide which Icxical entry to use? The lexical 
ambiguity problem is extremely difficult. In this case, there are. some fairly good heuristics. ‘The unmarked 
casc is triggered by a + en morphological feature, though there are-several marked rules to disambiguate some 
of the more difficult cases. ‘These rules may seem ad hoc, but they do have to be stated in one way or another. 


Perhaps we will find an explanation someday; for now, we will make do with a descriptive theory. 


(333) John was scen. (the unmarked case) 
(334) John has seen Bill. (perfect construction) 
(335) The horse raced past the barn. (+ en/ + ed ambiguity) 


The horse raced past the barn fell. 


There are two exceptional cases: the perfoct construction (333) and the + en/+ ed morphological ambiguity. 
The perfect construction blocks the passive rule from applying to its complement. This fact is stated in the 
lexical entry for Have. The morphological problem in (335) is disambiguated by the unification procedure. 
The two senses of raced ({ +en, past}) are merged (intersected) with the two senses of a tensed clause ({pres, 


past}) producing a unique result (sce figure 12). 


YAP has a production rule to gencrate a passive predicate pointer when it is needed. It looks something like 


the following, although a number of details have been omitted for clarity.1 


163. Page faults (gencrating lexical entries on the fly) become less and less probable as more and more lexical entries are 
added to the core fexicon. [may be more efficient to include redundant information in the lexicon which is frequently 
accessed, thus reducing the chance of a page fault. In other words, it may be worthwhile to sacrifice some linguistic 
complexity to achieve improved computational complexity. 

164. For example, there has to be a mechanism to prevent the rule from re-applying arbitrarily oflen to the same 
predicate. There is an uninteresting lisp expression in the pattern to accomplish this. 
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Fig. 12. Disambiguating + en/+ ed 
sentence: ‘The horse raced past the ... 


[, the horsc] tns: {pres, past} 
==WALL== 

Wp raced] tns: {past, +en} 
[p past) 

Laer thel 


There is a constraint equation which unifies a clause with its head (the vp). When the head is attached the 
constraint equation is evaluated, disambiguating the tns features. ‘The two senses of raced ({ +en, past}) are 
merged (intersected) with the two senses of up] ({pres, past}) producing a unique result. 


[, the horse raced] _tms: {past} 
[yp faced] tns: {past} 
= =WALL== 

[, Past] 


(336) (defrule passive trans 
(pattern () (= +en)) 
(action (passivize-pred down1))) 


The function passivize-pred transforms downl’s active predicate pointer into a passive one. (It simply 
replaces the fsubj slot with the fobj slot.)'® This should have the same externat‘appearance as though there 
were passive predicates stored in the lexicon. It is merely a space/time trade-off. 


6.6 Reanalysis 


In gencral, prepositional objects do not passivize. For example: 


165. Unfortunately, this docs require copying the predicate pointer. 
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(337) *The ball was gone to. 
*The river was seen at. 
*The boy was taken the ball from. 


However, there are some marked cases where passive is possible. ‘fe account for these facts, it has beca 
proposed that certain verb-particle combinations (¢.g. arrive at and look at) can reanalyze into a single: verb 
complex. ‘The reanalyzed form (338) can passivize, unlike (339), because the solution is a verbal: object 


whereas the station is prepositional object. 


(338) ‘They [, arrived at] [np- the solution]. 
The solution was arrived at. 


(339) They arrived lpp- at the station]. 
*The station was arrived at. 


Since YAP is not capable of distinguishing the scmantic difference between the solution and the bation, it 
cannot distinguish (338) frem (339), When syntactic clues are sufficient as in (340)-(341), YAP correctly 


performs the reanalysis. 


(340) I looked at the picture. 
The picture was looked at. 


(341) T went to the ball. 
*The ball was gone to. 


‘The difference between look and go is stated in the lexicon: Jook reanalyzes with at, but go docs not reanalyze 
with fo. The lexical entry for look at is listed below. Notice that it takes a direct object, not a prepositional 
object. 


(342) (def-pred look-at-1 look 
(fsubj ob! (np-)) 
(fcase obl (p) 
(fobj ob! (np-))) 
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We have scen how a number of lexical rules (raising, it-extraposition, there-insertion, auxiliary formation, 
passive, and reanalysis) are formulated in YAP. ‘This shows that many of the generalizations can be captured 


by a relatively simple device. 
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7. Local Structural Transformations 


The last chapter demonstrated several rules which operate on predicate pointers (fstructure). This chapter 
will discuss structural transformations which operate on constituent structure (cstructure). There are some 


important differences between lexical and structural rules. 


(343) Lexical rules are local in fstructure; structural rules are local in csfructure. 
(344) Structural rules have no lexically marked exceptions. 
(345) Lexical rules are structure preserving. ! 


By these criteria (which are admittedly very pro-lexicalist), it is very hard to find suitable candidates for a 
structural rule. (343) is not very discriminating; as we have scen, it is generally possible to state many rules in 
cither the fstructure or the cstructure. (344) is very pro-lexicalist, since almost every linguistic generalization 
has an exception. Only (345) establishes a class of structural rules; some rules (c.g. root transformations) are 


not structure preserving. !© This section will analyze two root transformations: aux-inversion and imperative. 


The structure preserving property [Emonds76] is analogous to side-effect! free (applicative) programming; 
both moves attempt to establish an invariant representation which remains intact after an arbitrary number of 
transformations (function calls). Linguists have found the invariance notion to be useful for describing 
grammar: computer scientists have discovered invariance important in program verification. It is gencrally 


agreed in both ficlds that structure preserving (applicative) formulations are desirable. 


166. [Emonds76] postulates that transformations divide into two categories: Structure- Preserving Transformations and 
Root Transformations. The former introduce or substitute a constituent C into a position ina phrase marker held by a 
node C2 root Wansformations move, copy and insert a constituent in root clauses. 

107, Actually the case is not so clear: there may be ways to reformulate these (ransformations to be structure preserving. 
For example, [Kaplan and Bresnan 80] present a structure preserving analysis of imperative. 

168. A progrant is said to cause side-effects tf tl modifies data structures ina non-invertible fashion. En general, it is 
possible to avoid: side-effects: there is a school of computer scientists who advocate completely side-cffect free 
programming. This position is somewhat analogous to the lexicalist school of linguists who advocate side-effect free 
analyses. 
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7.1 Aux-inyersion 


Perhaps the best example of a structural transformation is the so-called aux-inversion rule which has applicd 
to (346)-(350),! 


(346) Have | taken the ball? 

(347) wane balls have | taken? 

(348) Never have | taken so many balls! 

(349) Under no circumstances am | permitted to release these documents. 
(350) Nowhere could he find an alpaca carpet. 


YAP's aux-inversion rule undocs the inversion by switching the buffer cells containing the auxiliary and the 
subject noun phrase, thus capturing the linguistic generalization without increasing the computational 
complexity (memory is still severely bounded). ‘The aux-inversion rule inverts downl and down2 as 
illustrated in (351). It also labels up! with the mood feature {wh-q, yes-no-q} to distinguish the sentence 


from its declarative form.!7° 


(351) sentence: Have [ taken the ball? 
input. pyvinter: the ball? 


before after 

[;] I 
==WALL== =WALL== 
[, have] lap I] 

[np- I] [, have] 

[, taken] [, taken] 


A simple form of the aux-inversion rule is shown below.!7! 


169. Only yes-no and wh-questions have been implemented: the other cases shouldn't be too much more difficult. 

170. This doesn't work in the preposed adverbial case. Never have I seen so many balls! Bob Berwick (personal 
communications) has suggested that the inverted forms share a common LF (logical form) interpretation which 
distinguishes them from declarative sentences. . 
171. The fast term of the pattern could be an arbitrary lisp predicate which must be truc_in order for the rule to match. 
In practice, the predicates tend to test features of nodes in the buffer. In this case, the predicate crole-can-advance? is 
testing if upl is looking for a subject. Some details have been suppressed for clarity. For example, there are some 
agreement constraints which will be discussed later in this chapter. 
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(352) (defrule aux-inversion trans 
(pattern (= root) (=auxverb =np- ) (crole-can-advance? upl ‘csubj)) 
(action (invert) (sctfeat upt (yes-no-q wh-q) mood))) 


Aux-inversion is possible when up] contains a root clause!” looking for a subject, and the lower buffer holds 


the inverted auxiliary/np- pattern, !73 This rule was taken almost directly from Marcus’ Parsifal. ; 
7.2 imperative 


Imperative is a delction rule which applics to root clauses.'74 The parser ‘shnply restores the deleted elements 
and finishes the sentence as if nothing: had been missing, Given a sentence like (353), YAP. will insert, the 
words you will into the lower buffer, undoing the imperative transformatign. YAP will finish. the sentence as 
if it had been parsing (354). As in aux: ‘inversion, the transformation adds a mood feature to plete 
transformed sentence (353) from the untransformed sentence (354).,, ‘The rule is. given as (356)-below,!? 


172. The highest clause is a root clause. There are some other instances of Mot phenomena which YAP does not 
currently handle. For example, / said, “what are we going to do?” 

173. The following verbs act as auxiliaries in English: be. have. do, can, will. may, shall. ruin, and pethapé a few others. 
There is another marked rule (described in the next section) which blocks aux-inversion when do and have one American 
English) are being used in their mainverb senses as below: oid 


Have the boys take the exams! (mainverb) 
Whu had the boys take the exams? 

Do it! 

Who did it? 


Have the boys taken the exams? (auxverb) 
Whit have the boys taken? 

Did it bother you? 

Who does it bother? 


It is an unexplained fact that be and the British use of Aave invert (even in the mainverb sense). 
174. [Kaplan and Bresnin80] give a lexical analysis of imperative. 
175. This rule was also taken from Marcus: Parsifal. There is one ¢ difference: his rule me the word you into the buffer, 
not the words‘ ‘you will. YAP wilt | parse (aS tike (by, Parsifi al will parse ie (). 
(a) Be good! 
(b) You will be good! 
(c) *You be good! (wrong meaning) 


YAP drops the will to absorb the tense constraint on root clauses; root clauses are tensed. exccpt for imperatives which 
have no overt tense marker. 
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(353) ‘Take the ball! 
(354) You will take the ball! 


(355) before after 
is] Gl 
==WALL== ==WALL== 
[, take] {yp- oul 
Lact the] ly will] 
fn ball] {, take] 
dct the] 
(n ball] 


(356) (defrule imperative trans 
(pattern (=s) (=v) (and (= tnsless!76 downl) (crole-can-advance? up] *csubj))) 
(action (setfeat up] imperative mood) (drop- -words you will)... : 


7.3 Differential Diagnosis 


It happens that both aux-inversion and imperative have very ‘similar patterns. In cxamples like (357)-(359), 
there is some difficulty deciding which transformation shoidld apply. Some cases, such as (359), are 
grammatically ambiguous, and hence, it is not possible to disambiguate using just the rules of grammar 


(competence). ld 


(357) Have the boys take the ball! imperative 
(358) Have the boys taken the ball? inversion 
(359) Have the cggs fried ... ambiguous? 


A non-deterministic system could “try” both rules, accepting all analyses that happen to work out. A 
deterministic system is posed with a difficult problem; both transformations (aux-inversion and imperative) 
cause side effects which cannot be undone. A deterministic machine has tg. make the right-decision the first 


time; there will be no recovering if it selects the wrong transformation. ‘This section will discuss procedurcs 


176. The predicate = tns/ess tests for null inflection. 
177. The ambiguity may not be realized in performance. Marcus claims there is a strong preference for inversion in ihe 
unmarked case, though the marked interpretation can be forced by semantic and pragmatic biases. 
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for deciding which transformation should apply. 


Marcus believes this problem results from a lexical ambiguity between the two senses of have.'”® ‘The 
auxiliary Have undergoes inversion as in (360) unlike the main verb have (in American English). Hence, if we 
could distinguish the two forms of Aave, we could decide which transformation should apply. Marcus invokes 
a marked rule (360), called Have-diag, to disambiguate siereniay™ between the two senses of have. 


(360) pattern: 
downl: [, have] 
down2: lnp- <any> } 
down3: [cany> <any> ] 


If down3 is tnsless or down? is plural (f first or second Person), : ion marked exception 
then run imperative next! — ak 

Otherwise, run aux-inversion next. unmarked default 
‘The default path (inversion) is taken, unless there is marked evidence to the contrary. ee claims that the 
marked information must appeat in the next.three constituents. _,.H¢.has. some empitical evidence indicating 
that many people cannot disambjguate (361)-(362) because there is no.disamabiguating information, within the 
spccificd lookahead, In (363)-(364), the default; interpretation {ipvarsion) is blocked locally by. the: underlined 
words, and hence, (363)-(364) receive the exceptional interpretation (imperative). 


178. The nttin verb sense of have inverts more frecty in British “_ 

American; Do you have. match? 

British: Have you a match? 
179. The term differential diagnosis was derived from medical upplications. ‘It's believed that doctors have precompiled 
rules to differentiate between medical ¢onditions which have similar sympioms but require, very. different diagnoses, 
[Davis77] refers to these rules as meta-rules because they reason “about rules. This is a very powerful technique, though 
potentially expensive. 
180. Actually, this rule has a slight flaw: it fails to distinguish Have [ eaten? front Have me eaten!. This suggests that case 
features (in addition to person/number) should be used to disambiguate. Neither. YAP nor Parsifal use reflexive features 
to disambiguate. For example, compare: ie 

Have yourself completely taken advant: ige of, for all care! 

Have you completely taken advantage of every chance? _ 
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(361) ty Have] Lb the packages] I; delivered] tomorrow. unmarked 
(362) ly Ilave] [5 the soldicrs] l; given] their medals by their sweethearts. 


(363) Have them delivered tomorrow. marked 
(364) Have the soldiers take their sweethearts to the dance. 


This approach works in a large number of cases. Like other marked rules, it suggests three important 


questions: 


(365) How are diagnostics restricted? 
(366) Is there any empirical support for this approach? 
(367) How many diagnostics will be needed? 


Marcus’ lookahead buffer addresses question (365). ‘The three constituent limit is consistent with the 


ia Although Marcus’ 


empirical evidence mentioned above (361)-(364) and the garden path phenomena. 
approach has these desirable characteristics, there is some concern that a complete grammar would require 
too many diagnostics. Diagnostics are used when there is a Iexical ambiguity that would lead to multiple 
cstructures. The number of diagnostics becomes troublesome when they compare two or more 
transformations at a time, and henee, there may be a combinatoric number of diagnostics. It is quite 
reasonable to place conditions on a transformation one at a time; the problem comes when multiple 
transformations must be compared differentially. It is possible that differential diagnosis may require an 


inordinate number of rules. We will reformulate Marcus’ Have-diag as follows; !82 


(368) Aux-inversion is blocked when any of the following conditions cannot be met: competence 
down] has pres or past inflection 
down! can take down2 as subj (agree in person, number, gender and case) 
down] can take down3 as xcomp (agree in inflection) 


(369) Imperative is blocked when aux-inversion can apply. performance 


181. Like other performance limitations, the buffer length is subject to a certain amount of individual variation. 

182. We accept Marcus’ assumption that no rule can access beyond down3, although additionally, we allow rules to 
access up], up2 and up3. This is a performance limitation on backup/Jookahead. ft seems to be subject to the same 
idiosyncratic behavior that plague other performance constraints (c.g. individual variation). 
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Our formulation has three advantages over Marcus’: | 


(370) Clear separation of competence and performance 
(371) Covers a wider range of cases 
(372) Fewer differential rules 


It is important to separate competence and performance; performance filters such as (369) are gencrally more 
idiusyncratic than statements of competence (368). Performance phenomena are often subject to semantic 
and pragmatic biases, garden path behavior and variation from one inforayant to another. For example, (369) 
is subject to a certain amount of individual variation as: Mascus has observed; it is unlikely that (368) can be 


overruled in the same way. 


Our statement is more genctal than Marcus’; His rute only applics to Adve, our formulation covers all 
auxiliaries, including did and-was as itlustrated in (373)-(376). . 


(373) Who did it? Bens _ no inversion. 


(374) Who did it bother? inversion 
(375) Who was it? no inversion 


(376) Who was it bothering? inversion — 


Thirdly, our formulation requires fewer differential . diagnostics w disambiguate between several 
transformations. These rules are particularly costly because the number of necessary rules grows very quickly 
with the number of transformations. We have factored the sca constraints. from. the differesial 
diagnostics. Modularity is a welcome step. 


It would be desirable to completely climinate differential diagnostics, rules that mention multiple 
transformations. We will propose an alternative formulation that achieves many of the same results without 
the undesirable cost associated with mentioning multiple transformations ina single rule. ‘Traditionally, 
transformational grammarians imposed ordering constraints to block one rule when another can apply. 


Marcus’ scheme is less restrictive than the traditional ordering constraint: he imposes a partial order instead of 
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the more standard fotal order,}83 


Unfortunately, ordering relations are very difficult to formulate, as standard transformational grammarians 
have discovered. ‘There always scems to be an ordering paradox. An alternative formulation expresses the 
ordering relation in terms of features.!84 Suppose that imperative requires more precisely determined 
features than aux-inversion: it cannot trigger while the «ms features (for example) are underdetermined. 
Aux-inversion is less restrictive; it will trigger as long as the ms features are compatible, whether or not the 
other possibilities have been excluded. ‘This will assure that aux-inversion takes precedence, without 


explicitly mentioning both rules in the same diagnostic. 


The ordering mechanism is illustrated in (377)-(378). = 21s tests for a pres or pust feature, disregarding the 
other ims features; =: insless tests for an uniquely determined tis/ess feature. A word like have, which is both 
pres and tisfess ({pres, tnsless}), passes the aux-inversion pattern (377), but fails the impcrative pattern, and 
consequently, aux-inversion will be given first crack. If it should be explicitly blocked (by an agreement 


constraint), then imperative will be given a chance, '85 


(377) (defrule aux-inversion trans 
(pattern (= root) (=?tns =np-) ...) 
(action ...)) 


(378) (defrule imperative trans 
(pattern (= root) (=tnsless = np-)...) 
(action ...)) 


In this way, YAP achieves the effects of differential diagnoses without the associated disadvantages. ‘There is 
a natural separation of performance and competence. The competence idealizations specify agreement 
constraints; the realistic performance model qualifies them with “ordering” relations. We have proposed a 
statement of the “ordering™ relations which may be more robust than conventional formulations. 


Nevertheless, the rule ordering problem would completely evaporate if YAP had lexical (side-effect free) 


183. A total order ts transitive, reflexive, and antisymmetric: every clement is ordered with respect to every other. 
Marcus used a partial ordering scheme (priorities). A partial ordering scheme is not antisymmetric; two elements may 
have the same priority (unordered). 

184. This idea is only partially implemented in the current version, which still contains some differential diagnostics. 

185. The my feature is disambiguated when inversion is blocked. 
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formulations of these transformations. Side-cffects should be avoided whenever possible, especially in a 


deterministic framework. 


This chapter has outlined an approach for capturing logal. structural {eansformations, ‘taken from ‘Marcus’ 
.Parsifal. YAP undocs the transformations by. manipulating he topkahead buffer. .We have discussed-two 

structural transformations and their interactions; Since it..is.. possible to. implement. all of Marcus’ 

transformations in this framework, a simpie device is adequate. far. cypeuring many linguistic goncralizations. 
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8. Wh-movement 


A number of long distance transformations are categorized under wh-movement including: wh-questions, 


embedded questions, relative clauses and topicalization.'% 

(379) Who; did you see xi? wh- question 
(380) | wonder who; you saw x;? embedded question 
(381) I saw a boy who; you know x;. relative clause 
(382) The ball;, Bill took x;. topicalization 


‘These constructions are particularly interesting because the trace (x;) can be arbitrarily far from the operator 


(who;). 


(383) Who; did Bob say that Bill said that ... Mike said I saw xi? 

(384) | wonder who; Bob said that Bill said that ... Mike said I saw x? 
(385) I saw a boy who; Bob said that Bill said that ... Mike said I saw xj? 
(386) ‘The ball;, Bob said that Bill said that ... Mike said I saw;? 


Wh-movement illustrates yct another dependency across sccmingly unbounded distances. Like 
there-insertion, the solution is to find a representation (fstructure) where the dependencies arc local. YAP has 


another grammatical role (fwh) to hold the wh-clement. "7 


(387) There; seems x; likely x; to seem X; likely .. move-np 
(388) Who; did Bob say that Ki Bill said that Xj . a4 move-wh 


There are understood fwh clements in (388) just as there are understood fsubj clements in (387). The binding 
relation forms equivalence classes in both cases. ‘The equivalence property is very convenient for 
computational reasons discussed in chapter 5. All the co-indexed elements are represented collectively as a 
single node, not once for each individual member. Consequently, wh-movement is bounded in fstructure, 


even though it appcars to have unbounded consequences (see figure 13). 


186. Many people object to the topicalization construction. 

187. Our fivh role is like Bresnan-Kaplan’s super-down register, Chomsky’s comp node, Marcus’ wh-comp feature, 
Woods’ hold cell. Although these mechanisms are similar to one another, they do have slightly different propertics. For 
example, YAP's fivi role is passed from phrase to phrase whereas the other mechanisms pass the clement from clause to 
clause. In this respect, YAP’s approach is more like [Koster78] and [Gazdar79a.b,c] which treat all nodes cqually; there 
are no special bounding propertics associated with clause nodes. 
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Fig. 13. Wh-movement 
Who, did Bob say that x l Bill said ... 


xy pred: who 
x pred: do 
fwh: Xx] 
tns: {past} 
fsubj: X3 
fxcomp: x4 
x3 pred: Bob 
X4 pred: say 
fwh: x) 
tns: {tnsless} 
fsubj: x 3 
fscomp: Ks 
X5 pred: say 
fwh: x) 
tns: {past} 
; fsubj: X6 
X6 pred: Bill 


There are some differences between move-np and move-wh; move-np usesIcxical (predicate) rules to bind 
the intermediate subjects whereas move-wh uses structural ¢ps) rules: to bind the interniediate fwh slots. 
Compare (389) and (390);!88 Move-wh is a structural rule because it is constrained by phrase structure fules 
Such as (390), whereas move-np is lexical because it is constrained by predicate rules as in (389). 


188. It is possible to represent these rules much more efficiently using a markcdness theory. For example, the head is 
unified with its mother (by x-bar theory) unless explicitly marked otherwise. 
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(389) seem-1 -> cxcomp:{ap-, vp-} 
cxcomp = fxcomp(up) e 
fsubj(up) = fsubj(fxcomp(up)) move-np 


(390) vp -> chead:v (cobj:np-) (cxcomp:xp-) 
chead = up 
cobj = fobj(up) 
cxcomp = fxcomp(up) 
fwh(up) = fwh(fxcomp(up)) ; Bag : move-wh © 


8.1 Island Phenomena 


Wh-clements cannot be extracted from just any phrase; there are certain “islands” which are opaque to 
wh-movement. Islands are be explained in terms of consistency and coherence in the Bresnan-Kaplan 
framework. Some extractions are blocked betause tte fwh stot is already’ filled (inconsistent) and some are 


blocked because there isn’t a slot to fill (incoherent). 
8.1.1 Wh-islands 


In general, there can only be one extraction from a phrase because the fwh slot only has room for one value; 


multiple values will be inconsistent. Hepce the following, sentences. are, ungrammatical because there are 


inconsistent fwh clements associated with the bracketed expressions. #9 = 


(391) *Who; does John wonder [where Bill saw ti}? 
(392) *What; did you ask me [where you could buy t}? 
(393) *What; did [who sce t}? 

(394) *I wonder what; [who bought t)? 

(395) *What; docs John wonder [where to put t)]? 
(396) *Where; does John wonder [wha:. to put G}? 
(397) *What; docs John wonder [to put t; where]? 
(398) *Where; docs John wonder [to put what t]? 


189. These examples were given in Ken Hale's 1979 fall class at MIT. 
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There are some wh-islands which allow extraction. We have no cxplanatiom for this fact; YAP cannot 
currently naa wh-island violations. This is a very marked premnenee which thight be: tovered by a 


eth peeks 
marked rule,!™ 


(399) ?What does John know how to do? 

(400) What did John ask how to cook? 

(401) ?Here are the books that [| don’t know what to do with? 

(402) 21 just read a book which I can’t figure out why anyone would write. : 
(403) ?1 like the girl that you wonder what John sees in. 

(404) 21 found the book that John couldn't remember what the title of was. 


8.1.2 Ross’ Complex NP Constraint 


[Ross67] observed that extraction is generally blocked by np brackets. as in (405)-(407), (This is an over 


simplification.) 


ne | 
Pee 


(405) *Who; do you know lnp- the man that married t)? 
(406) *Who; did you hear Inp- a rumor that john betrayed t,]? 
(407) *Who, did you find (np- a copy ofa photograph of we 


YAP expresses these facts in the’ np- ps-rute. Most ps-rules piass thé fwh éléirient though constraint equations. 
For example, the vp ps-rule has a tonstraint equation’ ti’ pass ‘thé fw element into its “cconip: 
Swh(up) = fwh(xcomp(up)). There is no such rule associated with ap. Piece an attempt to moye, an fwh 
clement over an np- bracket will be incoherent. This accounts for the minimal ¢ contrast between (408)-(410) 


and the cxamples above. Ch 


(408) Who do you know that John marricd? eae “ fe fh 8 
(409) Who did you hear that John betrayed? : 
(410) Who did you find? 


190. These sentences were given in a recent talk by George Hart al MIT. Sune informants find these sentences perl 
acceptable while others (including the author) find then cxtremely-niarginal:: - 
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There are some more difficult cases. For example, if extraction is blocked by np-, then why is (411) 
grammatical? YAP has a marked rule to cover this case. These picture noun phrases are’still problematic for 


linguistic analysis. ‘The answer appears to involve the specificity of the np-. 


(411) Who did you sec lnp- a picture of ¢)? 

(412) *Who did you see [np- John's picture of tf? 
An account has been provided for both types of islands. We do not claint that these facts follow from YAP’s 
design. Our position is much weaker; we merely claim that these facts are compatible with the.design. Many 


linguists are currently working on a more explanatory theory. 
8.2 Gap Finding 


The really hard problem with wh-movement is finding the “gap” where the wh-clement originated. ‘lhis is 
not patticularly difficult for a nun-deterministic competence theory, but it is (probably) impossible for a 
deterministic processing model. YAP has made some simplifying approximations to the compctence 
idealization which may be valid in a realistic performance model. [n an ideal non-deterministic framework, 


there could be a phrase structure rule like: 


(413) up:gap-np- -> dl:t 
fwh(up) = dl 


Unfortunately, it is very difficult to formulate this rule in a deterministic framework. YAP approximates the 
ideal competence by looking for a gap after the other default ps actions have failed. Find-gap is a new 
default-ps action which is applied after the other actions as in (414). 


(414) attach 
predict 
Close 
find-gap 


This heuristic favors the latest possible gap. Tt corresponds to Fodor's Last-Resort Model of Gap Finding 
{Fodor78]. As she correctly observes, there are some problems with'-this model. - Like other marked 
exceptions (sce chapter 3), there.are some marked rules.to handle the problematic cases. Before suggesting 
some modifications to save the last-resort model, it would be useful to consider some alternatives. Fodor 


proposed three models of gap finding (415)-(417) and ultimately settled on the third alternative. 
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(415) First-Resort ([Marcus79}) 
(416) Last-Resort (YAP) 
(417) Lexical Expectation/Arc-Ordering ((Kaplan72], [Fodor78}) 


The first-resort and last-resort models can be implemented by the. default Ps actions. The first- resart model 
orders find-gap first whercas the last-resort model orders it last. 


(418) First-Resort  —_ Last-Resort 


find-gap attach 
attach predict 
predict close 
close find-gap 


The first-resort and last-resort madcls do not exclude lexically. marked, cases; they merely suggest an 
unmarked default. In.some sense, the ane arcerng, srateey, denies since correlations it explicit, lists the 


Beara 


tose 


although it may be civerniled by lexical marking i in certain cases. ete pane i, 3 
8.3 Evidence for the Last-Resort Model 


(419) I gave the boy who you wanted to give the books to three books. 


Sentence (419) is unacceptable. sad Grammatically speaking, it is s extremely ambiguous; there are no 0 less than 
four possible gaps as shown in (420). 


19]. A set is random when the shortest description explicitly lists each of its members. 

192. Arc-ordering is often formulated within a depth first (DFS) control structure. The DFS is in fact imposing a 
Structural constraint; it encourages low attachment. In Marcus’ non-deterministic framework, these structural correlations 
have tu be'stated elsewhere. The define psactions sectii tu be a ress she place. 

193. Pussible gaps are shown in pientheses.. :-Plus (4) and minus (—)dadicate rebitive processing difficulty.. The mare 
acceptable of the pair are marked with a plus. 

“194. Of the 40 test sentences'in [Marctis79, Appendix DJ; this is the only one thut YAP canhot parse.” (Some informants 
find the last gap acceptable as in: / gave the boy {who you wanted (a, give she. books.to yf three books. This strutcgy. is not 
incompatible with the Last-Resort Model, although it would require a slight modification.) 


shore AT 
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(420) #1 gave the boy who you wanted (t) to give (t) the books (1) to (t) three books. 


Why is it so difficult to find to find the gaps? The fast-resort modcl prefers to attach lexical material over gap 
finding and hence it misses all the gaps. This unacceptable sentence is very ‘supportive of the last-resort 
model but rather damaging to the first-resort model which can casily (") find the first gap. The cxamples 
don't need to be so extreme. We have already seen a garden path sentence (421) also favoring the last-resort 
‘model. : (422) shows that these GPs are fairly productive. . 


(421) #1 told the boy the dog bit Sue would help him. gu 
(422) 791 called the guy who the car was smashed up by a rotten driver. 


Corollary (423)! immediately follows from the last-resort model: np gaps are extremely marked!™ in 
positions immediately before lexical noun phrases. The reason Should be obvious, the last-resort model 
prefers attaching the lexical noun phrase over creating the gap, unless there i is positive evidence (i. e. semantic 
clues) to everpule the default. ‘This corollary accounts for the badnoss of ¢421) and (422).. ‘Two of the pau 
gaps in Ley are also excluded under this corollary to the iene eee . 


(423) The ‘Trace- NP Corollary: In the unmarked case, #[... t; NP ...], where t; is bound to a noun 
phrase. 


This corollary correctly predicts preferences in double object constructions. The lexical noun phrase is 
gencrally interpreted as the first object unless there is positive evidence to the contrary. Even then, the 


marked interpretation is generally less acceptable. = 


(424) + What did I give the boy t? 
+ Who did I give the book to t? 
— Who did | give t the book? 


(425) + What did you call a drunken sailor ? 
— Who did I call ta rotten driver? 


195. The corollary has been stated as a processing filter quite analogous to the competence fitters of [Chomsky and 
Lasnik77]. Filters are a convenient method of describing the facts. but they are probably inadequate as explanations. In 
this casc, we cannot explain why last- resurt scems to be the unmarked case. 

196. There are at least three productive * “counter-examplcs" to the corollary where the filter is inoperative. We will turn 
to these cases svon. 

197. The marked interpretation is excluded from certain dialects. 
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(426) + What do I consider John t? 
— Who do I consider t a fool? 


(427) + What did I tell the boy t? 
+ Who did I tell the story tot? 
— Who did 1 tell t the story? 


‘The last-resort strategy is consistent with the ‘Trace-X Filer (428). which is simiJar to.constraint (429).1%8. The 
constraint predicts that a trace of category X cannot appear just before lexical material of category X. 
Sentences (424)-(427) are consistent with this generalization of ‘the Trace-NP Corollary, ‘Unfortunately, there 
is little evidence in English to justify the move away fromthe ‘I'face-NP Corollary. (The Crucial evidence 


comes from French.) 
Ce) ‘The trace-X Filter: In the unmarked case, # |... i X. i where t is a trace of category xX. 


(429) The XX Extraction Constraint: If at:some point in its derivation a sentence contains a sequence of 
two constituents of the same. formal type, either of ;which. could be moved..or deleted by. a 
transformation, the transformation may not apply to the first constituent in the sequence. 

_{Hankamer73]}. a, oe; ai 


Although the last-resort strategy has many of the right characteristics, there are also many problems which 
-fequire marked rules, We will consider the following three peobloms here:)% :.:, 


(430) Ambiguity 
(431) Lexical Marking 
(432) Length 


198. Hankamer proposed that the XX Extraction Constraint belongs i in competence... Since it can be violated (in the 
marked case), we prefer to place it in performance. [Fodor78] also views the consiryint 4s a processing matter. 

199. It has been suggested that cleft sentences like, What / wanted that for nobody could understand, form another class 
of marked exceptions to the performance filter. ; 
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There are some ambiguous sentences which strongly resemble the pscudo-attachment case. In the 
pscudo-attachment case, there is a lexical xp- with two. possible mothers. . Pscudo-gap is. cxactly analogous 
except the xp- is a trace. 


(433) Put the block in the box on the table. pseudo-attachnent 


(434) Who do you want (t) to cat (t)? pseudo- gap 
(435) The duck is too old (t) to cat (t). | 

(436) Who did Mary promise (t) that she would marry (t)? 

(437) ‘To whom did Father say (t) that he was planning to write (t)?- 

438) Where did he say (t) he was going (t)? aad - 

(439) When did he say (t) he was. going (t)? 


Only (434) has been implemented, though the others shouldn’t be much more difficult. Pscudo-gaps have 
many of the same problems as pseudo-attachment. It is (probably) impossible to find all the gaps in sentences 
like (440). YAP settles for the first and last possible gaps as in (441), in the absence of disambiguating 


information. 

(440) Who do you want (t) to want (t) ... (t) to want (t) to eat (t)? 

(441) Who do you want (t) to want ... to want to eat (t)? 

8.3.2 Lexical Marking 

The unmarked case can be overruled by the lexicon as in (443). These cases have not becn.implemented. 


(442) + Who did the teacher walk to the cafeteria with? unmarked 
— Who did the teacher walk to une cafeteria? - 


(443) — Which book did the teacher read to the children from? 7 lexically marked 
+ Which book did the teacher read to the children? 
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Even though read and walk have the same subcategorization features (they both sclect an optional object and 
a verbal Saeeer adh have different PSEeee as luatrated — seu and snus This Suen’ is often 
inmptications associated with that pelation wHich’ are incompatible with’ the framewinr pee here; in 


particular, arc-ordering is crucially non-deterministic.™ 
8.3.3 Length 


Notice that judgments are less and less sharp as the second object increases in length. This is completely 
unexplained by our account. There are other length phenomena (such as ea: y np shift) which are more 
widely accepted. We seem to be missing a. gencralization., However it isnt clear] how to capture the leagth 
phenomena. [Frazier and Fodor78] used a front end filter (PPP) which divided chucks into: rouaniy® six words. 
Although this is an interesting proposal, it isn’t clear how it could be' implemented: ee 


200. [Rich75] gives a critical review of the arc-ordering position. In his opinion: 


Linguistic Pt “ ‘tional Mechani 
Center-embedding single-place HOLD list wrong 

Preferred readings of - ordered ttyingof © ~ ifladequate *~ 
Ambiguous Sentences alternatives (arcs) 

GP sentences back-tracking nas somewhat right 
Perccived Complexity HOLD list costing inconclusive 
Differences. arc counting “ 


His arguments are very convincing. One could view YAP as a DFS which only backs up after it takes a very serious GP. 
(We haven't implemented a GP recovery procedure yet. but backup would be the casicst way to do so.) A sentence is 
unacceptable just in case it causes YAP (as modified) to backup. This is a precise definition. The problem with the 
arc-ordering position is that backup describes both crashingly unacceptable GPs and extremely subtle preferences of 
ambiguous sentences. The sharpness is not related to any measure of backup that has been proposed. We suggest that 
subtle preferences have a very different explanation from GPs, 
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(444) # Who did you call t it? 
Who did you call t that? 
Who did you call t a rotten driver? 
Who did you call t the worst driver that you ever... 


8.4 Summary 


We have discussed four cases of wh-movement: wh-questions, embedded questions, relative clauses, and 
topicalization. Movement constructions suggest some interesting’ topics in both competence and 


performance. 


(445) Competence: locality principles & island phenomena 
(446) Performance: gap finding 


We have shown that “unbounded” movement phenomena are local using an appropriate representation, such 
as Bresnan-Kaplan’s fstructure.20! Locality is extremely convenient for processing because it enables YAP to 
apply movement rules without approximation. If the rules were truly non-local they would require 
unbounded memory and hence we should expect to discover empirical discrepancies from the competence 


idealization. However, since the idealization is local, there need not be any empirical discrepancies. 


The locality issucs are extremely complex; we have only addressed a few cases. Much of the linguistic 
discussion deals with islands which are opaque to wh-movement. ‘These islands should have a natural 
formulation in our representation (fstructure or move-alpha*). We have given an account (more or Icss) for 
two types of islands: wh-islands and Ross’ Complex NP Constraint. ‘This is still an active area of linguistic 


inquiry. 


201. Healso is possible to represent movement locally in Chomsky’s framework, using equivalence classes. We have 
previousl suggested that Bresnan-Kaplan’s merge operator (=) is an equivalence relauion, ATL the nodes which have 
been merged together (co-indexed) form a single equivalence class (index), which is represented as a single node in 
fstructure. For example. in the raising case (move-np). all the understood subjects are co-indexed inte a single node in 
fstructure. Similarly co-indexed traces in comp (/iva in YAP) are also a single node in fstructure. 

Using the same basic approach, we could represent movement locally in Chomsky’s system. Let move-alpha be a 
relauion between two phrases, and Jet move-alpha* be the transitive. symmetric and reflexive closure of move-alpha. 
Move-alpha*® is similar to Bresnan-Kaplan’s merge (=) operator: tt too defines equivalence classes corresponding to the 
index. The claim that movement is local in’ tstructure vorresponds to a claim that movement ts local on indexes 
(equivalence classes under move-wpha*). 
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The most difficult problem is finding the gap. We have argued for a last-resort model. It is consistent with 
some garden path data and Hankamer's XX Extraction Constraint, although it it does have some problems. The 
most scrious problem is lexical marking. It was suggested that n marked rules, squid. apply i in the.crucial. cases, 
although the proposal has not been implemented. There also appear to be some length effects, which are also 


unexplained. We outlined a partial solution to the pscudo-gap phenomena. 


Despite these problems, we have implemented a simple device which captures many of the wh-movement 
phenomena. This result?" Considerably weakens the traditional view that Processors “must be ‘Turing 
Equiv alent. The next chapter will iMustrate a “simple” mechaitism For | parsing many conjunction phenomena, 


which were also believed to require inordinate resources. 


202. Many other rescarchers have designed “simple” devices to captire wh-movement. Sec ‘[Marcus79]_ and 
[Gazdar79a,b.c] for two examples. 
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9. Conjunction 


Conjunction has been onc of the most difficult constructions. 40 parse, bqcause there seem. to be so many 
possible alternatives. Conjunction is a very good test of the FS hypothesis. How can we approximate the 
ideal ‘ Competence modct so that a FS processor cah ii cot ‘we'V ve made’ some  imprense i initial 
progress, although there is stiff gubstantldl work to'bd ‘adie. : epee Pee 


9.1 Simplifying Assumptions 


Many parsers have found conjunction difficult because they consider too many possibilitics. It is extremely 
important to consider as few alternatives as, possible. We. will impose several. very strict limitations on 


conjunction in order to limit the scope of the probicm. All of these restrictions are controversial. 
9.1.1 ‘The Constituent Assumption re 


(447) Assumption: Conjunction applies to constituents, not to arbitrary fragments. 


(448) The scene [of the movic] and fof the play] was in Chicago. 
Which [boys] and [girls] went? 
[Which boys] and [which girls) went? 
Which boys [went to the ballf a and {took the jar}? 


A is 


Although (447) is generally accepted, there have becn some objections. Sentences like (449)-(450) have been 
used to argue that conjuncts may not always be ‘constitierits:” We’ wilt’ Aigue that despite appearances both’ 
(451) and (452) are constituents. 


ra ; na 
(449) John [drove through] and {completely demolished} a plate gliss window. - [Woods73] 
(450) Mary [expressed costs in dallara] and. [weights in. pounds}... b-:. , oi e : {Martin80} 


(451) [vp drove through [np- ]] 
(452) vp [, ] weights in pounds} 


‘The constituent assumption is very convenient for processing, as we will see. 
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9.1.2 The Category Assumption 


(453) Assumption: Fach conjunct has the same category. | 


This assumption is also fairly standard, though there have been arguments to the contrary. (Martin80] 
provides the following “counter-example”. (455) is his analysis; (456) is.our own, 


(454) We expect difficulties now and in the future. : 
(455) We expect difficulties Lip- now] and op- in the future]. Martin 
(456) We expect difficulties lop- now] and [55- in the future]. re YAP. 


In this case, it seems reasonable to call now a Prepositiohat aes This i is a small cost to pay to save the 


category assumption. 
9.1.3 The Across-the-Board Convention 


(457) Assumption: Each conjunct has the same number of wh-gaps. Furthermore, the gaps have the 
same category.3 


The last three assumptions can be summarized in Gazdar-Notation™ "as (458) (The comparative 
construction illustrates the need for some morc categories, @ @ and qr) to represent quantifiers. 


Comparatives have not been implemented.) 
(458) Assumption: Each conjunct has the same Gazdar-Notatign. 


(459) *John is easy [ 
John is easy [ 


to please] and [,, vp- to love Mary]. 
to love]. 


vp-/np- 


to please] a and [ 


vp-/np- ” ¥p:/np- 


(460) “Phe man who [, 7,5. Mary loves] and [, Sally'hates George} conipwted my tax. 
The man who [, /np- Mary loves] and [s/np- Sally hates] computed my tax. 
(461) The kennel which [, /np- Mary made] and [, /np- Fido sleeps in] has been stolen. 


‘The kennel in which [, /pp- Mary keeps drugs and ls/pp- Fido sleeps] has been stolen. 


203. IL seems that the gaps have to be identical in every respect, not just category. That is, they have the same reference, 
person, number, gender, case, inflection, etc. 

204. In Gazdar-Notation, X/Y refers to a node of category X containing a gap of category Y. 

205. These examples are taken from [Gazdar79c}. Tough movement and comparative arc not currently implemented. 
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“The kennel (in) which-[¢ p+ Mary made} aad {y,,3,.' Fido sleops} has becn stolen. 


ERs. : 
ene ars eae 


(462) John saw more horses than coke wie or _ Pete talked to]. 
: sy 


9,2 Simple Cases 
In the simple case, the conjuncts happen to be in up] and down2 as below. 


(463) Bob and Bill saw it. 


(464) [, Bob] es 
[np- Bob] first conjunct 
= =WALL== woe f 
Leonj. Sad] 
[saw] biack cag Sele: 


Second conjunct 


Conjunction is possible in (464) because down] is a conjunction and up] and down? are constituents of the 
same category with matching gaps. There is a marked rule which looks for this pattern. i. 


9.2.1 Attaching Conjuncts 
py 
Attaching conjuncts is different from other types of attaohment; there is a special slot in cstructuse nodes, = 


conjuncts. ae pe ee 


(465) np- -> np- conj np- standtird wD 
(466) ls fnp- np- Bob] and lnp- Bill}) ... a ye 


(467) np- -> chead:np cxcomp:{ vp-} cxcomp: {s-} cconjuncts:np- YAP 


(468) [, hnp- Bob and np- bill]. 


Using the aiitied approach, YAP couldn't attach Bob to the rot because thore might be ‘a conunction node 
in between. Consequently, attachment, wouldn't be possible until ‘the right ‘edge has been read, But this 
would prevent Carly closure (A-over-A closure principle): Loscral neinode. could be attached ‘until all of its 


descendants have been completed. This is very unfortunate. YAP's approach avoids this problem because 
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there are no nodes between the. first-conjunct Boh. and. the #eot, and ‘hence, attachment is possible before 


conjunction is considered. 


“peg 


After attaching the conjuncts, lnp- Bi bash wil fill te eons slot of hp p: Rodd and the machine state will be: 


(469) [, Bob and Bill} 
Bill] 


‘The sentence will now be parsed as if[,,_ Bill] is the subject.27 


np 


9.2.2 Attention Shift 


The approach just outlined works on (470), but fails on (471) where minimal attachment is misleading. 
Fig, 14. Attention Shift 


‘sentence: | saw Bob and Bill saw me. 
input pointer: me. 


before after 

[, ! saw Bob} panes 
(vp saw Bob] fp saw Bob] - 
[np- Bobl [np- Bobl 
==WALL== conj and] 
leonj and] ==WALL== 
lnp- Bill] Inp- Bill] 

[, saw] [, sav] 


206. Yet another alternative would use the standard phrase structure rules, It would attach the first conjunct as if there 
were going :to be a conjinction,, The second conjunct would then be Choswky Adjelaed when it is discovered. (This may 
be a notational variant of the current implementation.) : 

207. There is only one difference: fhp _ Bob and Bill] is'ptural whereas lnp- Biff} i$ singular. YAP's Solution assumes that 
all of functional features ire inherited, dn. fact, umber vidaes are notdaherited in the -usuak way. YAP actually replaces 
the number value in this case. There is a more attractive solution to be found. 


“t 
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(470) | saw Bob and Bill. 
(471) I saw Bob and Bill saw me. 


The solution is to shift the attention" of the machine past the and building Bill saw me bottom up. ‘Then the 
machine will return its attention back to the conjunction and finish the sentence as if [, Bill saw mc] came 


prepackaged as a single unit. 


YAP shifts its attention by moving down! into the upper buffer as in figure 14. Attention return is just the 
inverse; YAP moves up! back into the lower buffer.2 ‘Tae technique is very gencral; it allows bottom-up 


chunks to appear prepackaged. Attention shifting is hcavily-used- to parse noun phrases. 


There are some important: issues concerning the order of attention shift and return in the default ps rules. 
Return is last. It isn’t clear-where shift should be; Marcus ordered it first,2!° we've ordered it much later. The 


issucs arc not well understood; we're not prepared to make a coherent argument.2!! 


(472) YAP Marcus! Parsifal = 


attach attention shift 
predict find gap 
attention shift attach 

close predict 

find gap close 


attention return 


attention return 


208. The terminology is taken from [Marcus79] who used a similar technique to parse noun phrases. 

209. There are two registers associated with cach node (as-sfatus and us-return-status). which’ prevent infinite attention 
shifis and returns. The details aren't very interesting. 

210, Marcus’ attention. shift mechanism. was conditional on -catcgory lype..,.Parsifal would attention shift for noun 
phrases. but not for verb phrases of prepositional phrases. Our mechanism applies to all categories. 

211. The ordering of actions in Maret Parsiful is partly defined by the imerpretef (attention ahifl and return) and partly 
implicit in the grammar (attach, predict, close and find gap). The implicit order may be incorrect; it is our own 
interpretation of his grammar. 
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9.2.3 Closing 


After attention shifting to parse [, Bill saw me], the machine st state is ce (Icft ads) he machine will ihe 
He : “oe st et 


close up! repeatedly until conjunction is possible. 


(473) before after closing once aficr closing twice . 
[, | saw Bob] 
vp saw Bob] [, | saw Bob} 
lnp- Bob] lp saw Bob] pee [, | saw Bob} 
==WALL== setWALLS== © 0 ==WALL== 
leonj and] conj and] lconj and] 
Is] [5] i} 
Inp- Bill] [np- Bill} lap Bill] 
[, saw] [saw] [ saw] 


Conjunction applies just as it did in the simple case, / saw Bob and Bill. Down2 fills the cconjuncts slot of 


up], Icaving the machine in (474). The rest of the sentence arses just like the simple sentence, Bill saw me. 


(474) {,] 
==WALL= = 
lnp- Bill] 

[, saw] 


9.2.4 Summary of the Simple Cases 


I.ct us summarize the simple conjunction rule. First, the machine attention shifts for the non-minimal 
attachment case (476). In the non-minimal case, YAP will predict an s just before Bill saw me. Then YAP 


will return attention to the and. 


(475) 1 saw Bob and Bill. 
(476) I saw Bob and Bill saw me. 


Secondly, YAP tries to attach conjuncts, if pessible. Upl and down have to be constituents and should 
match in category. and-gaps. Finally, if that doesn’ t work, YAP eo cloeo wp. 
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(477) Attention shift 
(478) Attach-conjuncts 
(479) Close 


This approach has some problems. It finds only the lowest attachment, not the full range of ambiguous 
possibilities. YAP should pscudo-attach conjuncts in ambiguous cases such as (480). It should be possible to 


? implement pscudo-attachment in these cases, but the details have not been worked out. 
(480) Bill told Bob [that Mike told Harry] and [Sam told Jack]. 


‘There are more difficult cases where pscudo-attachment is not a likely solution. It is not clear how (482) and 


(483) could be represented in a single structure. Even worse, YAP prefers the unlikely interpretation (483) 


because Bill left builds a clause bottom-up. ° 


(481) | know Bob and Bill left. 
(482) I know [Bob and Bill] left 
(483) [I know Bob] and [Bill left] 


‘The gencral approach has been very effective although there are many problems to be solved. 
9.3 Deletions 


It is possible for one of the conjuncts to contain'a deleted clement. In the gapping case, the verb in the second 


conjunct is deleted; in right node raising, an object in the first conjunct is missing. 


(484) Bob saw Bill and Sue Mary. gapping 
(485) Bob looked at and Bill took the jar. right node raising 


Both of these constructions appear to violate the constituent assumption. With a delction analysis, though, it 


is possible to save the constituent assumption. As we have suggested, (484)-(485) will be analyzed as:2!2 


212. Right node raising is usually analyzed as: 


[Bob louked at (,] and [Bill took ti] [the jar}, 


‘Our analysis is simpler to implement. Although this alone isn't a valid reason to prefer one analysis over another, there is 
sufficient controversy over right node raising that it didn’t seem worth the effort to implement it precisely. 
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(486) [Bob saw Bill] and [, Sue [| ] Mary] 


(487) [Bob looked at lnp- J] and [Bill took the jar] 


9.3.1 Right Node Raising 


YAP has a marked rule to parse right node raising. When there is a conjunction (c.g. and) in downl and up] 
can't close, then YAP assumes right node raising. ‘The analysis crucially depends on the constituent 
assumption; if a conjunct is not a complete constituent, then by assumption the rest must have been deleted. 
Having detected the deletion, YAP undoes the transformation, inserting an empty noun phrase back into the 


buffer as in figure 15. 


The analysis has some problems; it docs not bind the empty noun phrase to an object in the second conjunct. 


YAP would erroneously accept ill-formed sentences such as (488). There is some controversy over the 


appropriate binding mechanism; it isn’t clear if it is movement as in [Gazdar79c] or anaphoric.23 


Fig. 15. Right Node Raising 
sentence: Bob looked at and Bill took the jar 


before after 
[, Bob looked at] [, Bob looked at] 
vp looked at] [vp looked at] 
==WALL== ==WALL== 
Lonj andi np- | 
np- Bill] lconj and] 
[, took] [np- Bill] 

[, took] 


213. Itis generally agreed that the subject of drink is anaphorically bound in the following cases. 


Drinking gin can be fun, 
It doesn’t require a glass to drink gin. 
Having drunk gin all day, | was completely wasted. 


There are several inportant differences between anaphoric control and movement. This paper though will not discuss 
bound anaphora. 
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(488) *1 took and you went. 


Optional arguments illustrate another problem. YAP will detect only obligatory clements which have been 
deleted; optional clements are also-subject to deletion. YAP-will not-detect an object of ate in (489). 


(489) | ate ([,5- }) and you drank everything they brought. 


np- 
9.3.2 Gapping 
“Gapping” is the case where the second conjunct’s verb has been deleted. (490) is a simple example. 


(490) [Bob saw Bill] and [, Sue [, ] Mary] 


YAP parses these by undoing the transformation. When the lower buffer contains a conjunction followed by 
two noun phrases, YAP inserts an empty verb into the buffer. To exclude intcrprctations such as-(492), YAP 
merges the predicates from both conjuncts. Sce figure 16. oh ot 


Fig. 16. Gapping 
sentence: Bob saw Bill and Sue Mary. 


[, Bob saw Bill] before transformation 
==WALL== 

lconj and] 

np; Suc] 

Inp- Mary] 


[, Bob saw Bill] after transformation 
==WALL== 

Econj andl 

lnp- Sue] 

ly] sec-] -> fsubj:np- fobj:np- 

[np- Mary] 
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(491) Bob persuaded Bill to leave and Sue Mary. 
(492) *Bob pemiaded Bill to leave and ( Bob eS Sue Mary. 


sae 


The implementation is not as general. as it should bc: the verb can be. deleted :in many other contexts, YAP 
can find a deleted verb in any projection of v (in vp, vp-, s and s-). For example, YAP correctly parses (493). 
Unfortunately, it finds only the lowest possible interpretation: it wil! hut’ discover (494) unless there is some 
positive reason (i.e. semantics) to reject (493). ‘The gapping pattern crucially depends on two noun phrases; it 
will not detect gapping when the second object is an xcomp as in (495)-(497). Aside from the ambiguity 
problem, these problems shouldn't be too difficult to correct. ‘The simple cases of gapping were implemented 


to show plausibility within our restricted framework. 


(493) Bob [gave Bill a ball] and [vp Sam a jar]. 
(494) [Bob gave Bill a ball] and [, Sam a jar.] 


(495) Bob persuaded Bill te leave and Sam Lp to stay}. : 
(496) | expressed costs in dollars and weights [,,,. ia pounds}. 


(497) I considered Bill likely to win and Sam Lap- likely to lose}. 


9.4 Summary 


In summary, we have presented a simple approach to parse many conjunction constructions including some 
cases of right node raising and gapping. Although there are many problems to be solved, these analyses 
indicate that it is plausible for a FS deterministic processor to parse conjunction. This discussion responds to 


Cie 
the traditional arguinents that a I°S processor cannot in principle capture the conjunction gencralizations. 


We have previously suggested that closure actually simplifies conjunction. YAP uses closure to find the first 
conjunct: it will continuously close off up] until the first conjunct is in upl. Furthermore, closure assures that 
all possible conjuncts will be in the a buffer; this makes it much casier to pscudo-attach conjuncts sinee it 


is casy to find all the possibilitics.224 


214. YAP docs not currently pscudo-attach conjuncts, although it was designed with this in mind. 
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10. Conclusion 


We have hypothesized that a computationally simple device is Sufficient for processing natural language. By 
incorporating two processing constraints, FS and Marcus’ Determinism, i it was possible to construct a parser 
which approximates many competence idealizations, YAP Was designed to fail precisely where the 
idealizations require unrealistic resources. YAP's success, as far as it goes, provides some evidence for the 


~ hypothesis.. 
- 10.1 The Traditional Position . 


Traditionally there have been many arguments for coinputationally complex models of natural language. 
Much of the carly literature, though, docs not refute our hypothesis, but mercly cast doubt on its feasibility. 
Admittedly, it is casier to find descriptions using more powerful (complex) techniques, but is it necessary to 
use more powerful techniques? The traditional arguments are extremely negative; if the problem is really as 
hard as they suggest, then the only solution is to grin and bear it. It is easy to show how hard a problem might 


be, but it is a real accomplishment to find a simple clegant solution. 


ee s carly arguments are rightly cautious; they do not exclude the pent of a cS processor. He 
as more eine ‘Over the’ years, oui his posidin has Been aimiauenweed & asa cathe schungan of 
FS approaches. It is merely a feasibility argument. ‘To a certain extent he is correct, [Chomsky56, pp. 113} 
“the grammar of English is materially simplified if phrase structure description is limited to a kernel of simple 
sentences from which all other sentences are constructed by repeated: transformations; and that this view of 
linguistic structure gives a certain insight into the use and understanding of language.” Hence, compctence 
idcalizations should use poworful devices. However, cig rege nol teay that language should be ase ceigr by 


exactly the same machinery 


‘This is a very common situation in engineering. Engincers devetsp ideal models to gain fruitful insights they 
do not cxpect:their model to perfectly teplicate the real. world. :'They: will ae the theory as far as it gocs, and 
then joke about "Murphy’s Law”. “Idcalizations ; are very usefal, but they can ‘t ‘be taken too’ seriously; they 
simply don’t work in all: cases. Physical machines do not behave ideally. : 
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[Chomsky56] provides a “counter-example” to FS models. It generates arbitrary center-embedding and 
hence it is beyond the generative capacity ofa FSM. Since his counter-examples are grammatical (part of the 
ideal competence model of language), this proves that a ‘FSM cannot process competence. ais However, it is 
well- “known that arbitrarily decp center-cmbedding is universally ‘unacceptable, and hence, Chomsky’s 


arguments do nof apply to performance. We have no reason to exclude the possibility of a FS parser. 


He correctly suggests that a parser should encode a simple and "revealing" grammar. \t is not clear how this 
can be accomplished with a simple device. YAP introduces a number of approximations (i. ¢. bounded stack, 
finite lookahead, ...) in order to approximate an elegant (though complex)’ ‘competenec gramniar with 


reasonable resources, Chomsky has questioned this move for two reasons:7!6 


215. “Turning now to English, we find that there are infinite sets of sentences that have dependency sets with more ‘than 
any fixed number of terms. For cxample, let S}. Sp, ... be déclarative sentences. Then the dee are all tial 


sentences: ua os 


gd 3) (i) If S}. then S». 
(ii) FitherS 3 OF Sq. 
(iii) The man who said that Sg, is arriving today. 


These sentences have dependencics between “if~then’, ‘either’-‘or", ‘man’-'is. But we'can choose S},S3, 55 which appear 
between. the inéerdependent words, as €13i), (13ii), or (1 3ii) themselves." [Chomsk y56-pp, L15} 

216. "Although we have found that no finite-state Markov process [YAP] that produces sentences from left to right can 
serve as at} English grammar [competence]. we might inquire into the possibility of constructing a sequence of such 
devices that in some nontrivial way, come closer and closer to matching the output of a satisfactory English grammar. 
Suppose, for example. that for fixed a we construct a finite-state grammar in the following manner: one state of the 
grammar is associated with each sequence of English words of length n [ordered by statistical frequency] ... as n increases, 
the output of such gramntars: will come to fook more and mure like Rygliste ....: This: ficl-has occasionally -led: to: the 
suggestion that a theory of linguistic structure might be fashioned on such a model ... 

Whatever the other interests of statistical approximation in this sense hiay be, it is clear that it can shed no light off the 
problems of grammar. There is no general rckition between the frequency of 4 string (or its component parts) and. its 
grammiaticalness ... there is no significant correlation between order of approximation and grammaticalness. If we order 
the strings of a given Iength in terms of order of approximations to English, we shall find both grammatical and 
ungrammatical strings scattered throughout the list, from top to bottom. Hence the notion of statistical approximation 
appears Lo be irrelevant to grammar.” [Chomsky56 pp. 116] 
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(498) Arc the approximations revealing? 
(499) What are reasonable approximations? 


We have attempted to respond to both points. First, they are revealing because they suggest a number of 
crucial differences between competence and performance. For example, Lasnik’s Noncoreference Rule is an 
impractical idealization; a more realistic approximation (using the A-over-A carly closure principle) predicts 
certain coreferential possibilitics which may actually reflect the real empirical facts more accuratcly than 
Lasnik’s idealization. We have discussed many other constructions which are similar in this respect, such as: 


center-embedding, crossed dependencies and garden paths. 


Chomsky’s second criticism is also well-taken; it is very difficult to find independently motivated 
approximations. He rightly criticizes a statistical approach for missing the relevant generalizations. In this 
work, we have attempted to motivate effective approximations without sacrificing linguistic significance. 
YAP captures many linguistic generalizations such as: raising, passive, there-insertion (chapter 6), inversion, 
impcrative (chapter 7), wh-movement (chapter 8), and conjunction (chapter 9).2"7 ‘These gencralizations are 
basically orthogonal to the two processing approximations: FS and determinism. Hence, the approach taken 


here nay be a reasonable compromise between processing complexity and linguistic elegance. 
10.2 Summary 


We have been most concerned with two performance constraints: FS and determinism. Both of these 
constraints reduce the computational power, which is always a welcome step in computer science. ‘The 
question is whether the machine retains cnough oower to parse language. We have demonstrated, by 
implementing YAP, that it is sufficient to parse certain difficult constructions. Furthermore we have 
defended a number of simplifying assumptions as more accurate descriptions of the empirical facts. 
Chapters 1 through 4 discussed some evidence involving center-embedding, crossing dependencies and 
noncoreference. These constructions are provably complex (in competence), and as predicted, they do not 
behave ideally, even at severcly shallow depths. ‘This is suggestive evidence in favor of our simplifying 


assumptions. It appears that a// cxamples of complex behavior are universally unacceptable. 


217. One could rightly criticize these transformations as mere stipulations. A truly revealing theory would cxplain the 
facts. We have described (stipulated) many of Bresnan-Kaplan’s analyses as they are. When deeper explanations are 
found, it may be worthwhile to redesign YAP. 
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‘There are many difficult issues dealing with a particular implementation of the approximations. Chapter 2 
discussed several closure proposals. We finally scttled on a compromise (the A-over-A carly closure principle) 
which has some of the right limiting properties (w.r.t. premature/ineffcctive), but may have some problems in 
certain borderline cases (three deep center-embedded sentences). The limiting cases are far more important; | 
the ficld may not have progressed sufficiently far to make the subtle distinctions necessary for the borderline 


cases. 


Chapter 4 dealt with attachment strategies. We advocated a default mode of operation (attach, predict, and 
then close) which covers most cases although there are many exceptions.’ The exccptions falf into four classes: 
carly closure (chapter 2), non-minimal attachment (chapter 4), pscudo-attachment (chapter 4) and 


transformations (chapters 6-9). 


Pscudo-attachment illustrates the delayed binding approach which is a recurrent theme in this work. The idea 
is to avoid making decisions which may have to be taken back at a later time: ‘his is particularly crucial in a 
deterministic framework which prevents the system from feverting previous commitments. In the 
pscudo-attachment case, the system can decide that it cannot detide how to attach, and hence it attaches both 


ways. 


The delayed binding approach is also central to feature manipulation (chapter 5). An alternative approach 
would try cach feature value combination non-dcterministically until it found a combination which doesn’t 
violate any agreement constraints. This can be very time consuming. YAP'’s approach is a constraint 
propagation technique; it applies the constraints themselves to the fstructure. ‘The difference between the two 
approaches becomes apparent when the constraints underdetermine the final outcome, such as (500)-(501). 
YAP makes a single deterministic pass; it is no harder to search an underdetermined fstructure than any 
other. A non-deterministic Parser, on the other hand, has to search the fstructure once for cach combination 
of values; the underdetermined case requires much more time because there are more combinations of values. 


(500) 1 put it down. underdetermined tense 
(501) The deer Ieft. underdetermined number 


The lexicalist position is very compatible with a delayed binding approach. Although it is possible to write a 
deterministic transformational grammar (as Marcus did), we have found the Icxicalist position more 
sympathetic with the notion of constraints, which is crucial in our formulation of delayed binding. For 
cxamplc, both approaches have a mechanism for "raising" understood subjects as in (502); Bresnan and 
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Kaplan use the constraint equation, subfup) = subj(xcomplup) where Chomsky uses the transformation 
move-np. Bresnan-Kaplan’s constraint cquations fall rather naturally into a constraint propagation 
framework; it might require some ingenuity to reformulate Chomsky’s movement as a constraint. Although it 
is probably possible to reformulate movement in this way, Bresnan-Kaplan's formulation requires little 


modification to fit into a constraint propagation framework. 


(502) John; seems x; to be a nice guy. 


In summary, we have proposed that a deterministic FS parser is sufficient to parse natural language without 
sacrificing linguistic generalizations. To justify this claim, we have designed yet another parser (YAP) which 
encodes many of Bresnan-Kaplan’s analyses in a deterministic FS framework. Although there are many 
unsolved problems (i.c. lexical ambiguity, syntactic/semantic interaction, ...), we have demonstrated 
plausibility for the underlying design which incorporates both performance (FS and determinism) and 


competence (Bresnan-Kaplan’s lexical framework). 
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Appendix | 


Sentences (503)-(536) were taken from Appendix D [Marcus79]., ‘These examples illustrate passive, raising, 
there-insertion, some lexical ambiguity (thal, meet and. Schedule), aux-inversion, imperative and 
wh-movement. YAP can parse all of them except (534) which is unacceptable. Chapter 8 discusses this 


sentence in more detail. 


(503) I told that boy that boys should do it. 
(504) ‘The jar seems tv be broken, 
(505) ‘There seems to be ajar breken. 
(506) { wanted John to do it. 

(507) I want to do it. 

(508) I persuaded John to do it. 


(509) There seems to have been a meeting scheduled for Friday. 


(510) Schedule a meeting for Friday. 

(511) Is there a meeting scheduled for Friday? 

(512) A mecting seems to have been scheduled for Friday. 
(513) I told the boy that i saw Sue. 

(514) I told Suc you would schedule the meeting. 

(515) I told the girl that you would schedule the meeting. 
(516) The boy who wanted to meet you scheduled the mecting. 
(517) The boy who you met scheduled the meeting. 
(518) Who did John see? 

(519) Who broke the jar? 

(520) What did Bob give to Suc? 

(521) Who did Bob give the book? 

(522) Who did Bob give the book to? 

(523) I promised John to do it. 

(524) Who did you say that Bill told? 

(525) You promised to give the book to John. 

(526) Who did you promise to give the book to? 

(527) Who did you promise to schedule the meeting? 
(528) Who did you say scheduled the meeting? 

(529) Who did you persuade to do it? 

(530) What did you give Suc yesterday? 

(531) Who did you ask to schedule the meeting? 


that diagnostic 
passive, subject raising 
there-insertion 

object rasing 


imperative 
aux-inversion 


wh-movement 
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(532) Who do you want to give a book to tomorrow? 

(533) Who did you want to give a book to Sue? 

(534) #1 gave the boy who you wanted to give the books to the books? 
(535) Who did you promise to give the book to tomorrow? 

(536) Who did you promise to give the book to Sue tomorrow? 


YAP can also parse the following conjunction sentences. ‘These sentences were selected to iNustrate YAP’s 
abilitics, both positive and negative. Many of these sentences may be unacceptable and/or ungrammatical for 
reasons which YAP docs not consider. For example, YAP docs.no pragmatic analysis; (540) is syntactically 
well-formed even though it may sound somewhat odd. Similarly, (541) is probably ungrammatical because 
the trace has conflicting case; it reccives objective case from the first conjunct and oblique case from the 
second. It would be simple enough to change the grammar accordingly. Finally, (542) demonstrates a real 
problem with YAP’s formulation of right node raising; YAP does not require the missing noun phrase to 
“match” with the right most noun phrase in the second conjunct. Although there are some problems with 


YAP’s formulation of conjunction, it demonstrates some real progress. 


(537) Which boys and girls went? 

(538) Which boys and which girls went? 

(539) Which boys went to the ball and took the jar? 
(540) Which boys went to the ball and into it? 
(541) What boy did bill look at and give a ball to? 
(542) Bob looked at and gave a ball to the boy. 
(543) Bob gave Bill a ball and John a jar. 

(S44) Bob saw Bill and Sue Mary. 

(545) I want Bill, Bob, and John to be nice. 


The following sentences were taken from a homework problem given by Ken Hale last fall. The first set are 
all grammatical; the second violate island conditions and, hence are ungrammatical. YAP can parse all the 
grammatical ones and none of the ungrammatical ones. Scc the discussion of island phenomena in chapter 8. 


(546) Who should I ask where I can get a copy of Aspects? 
(547) What is it expected that Max will work on next? 
(548) What do you expect that Max will work on next? 
(549) What is Max expected to work on next? 

(550) What do you expect Max to work on next? 

(551) Who is expected to work on case-marking next? 
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(552) Who saw what? 

(553) | wonder who bought what. 

(554) John wonders where to put what. 

(555) John wonders what to put where. 

(556) What does John want to put where? 
(557) Where does John want to put what? 
(558) Who did you find a photograph of? 

(559) It is believed John has won the clection. 
(560) John is believed to have won the elcction. 


(561) *Who docs John wonder where Bill saw? 
(562) *What did you ask me where you could buy? 
(563) *What is expected that Max will work on next? 
(564) *What is expected Max to work on next? 
(565) *What did who see? 

(566) *! wonder what who bought? 

(567) *What does John wonder where to put? 

(568) *Where does John wonder what to put? 

(569) *What docs John wonder to put where? 

(570) *Where docs John wonder to put what? 

(571) *Who do you know the man that married? 
(572) *Who did you hear a rumor that John betrayed? 
(573) *Who did you find a copy of a photograph of? 
(574) *John is believed has won the election. 

(575) *John seems won the election. 


The following illustrate some other gencralizations: 


(576) It secms likely that John would be sitting. 
(577) There seems to bea table in the kitchen. 


(578) That I might take a ball scems likely. 
(579) For mc to take a ball seems nice. 
(580) To take a ball seems nice. 


(581) I wonder what to do. 
(582) | wonder what he should do. 
(583) | wonder what should have been done. 


Appendix | 


it-extraposilion 
there-insertion 


sentential subjects 


embedded questions 
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(584) The ball, he took. topicalization 


We have said very little regarding lexical ambiguity, although there are a few marked rules to cover some 
simple cases. There is one rule to distinguish an auxiliary from a main verb and another to separate the 
various uses of that (a complementizer, a relative pronoun, a normal pronoun, and a determiner). ‘The first 
rule was discussed in chapter 7. Neither rule is particularly clegant; Mitne is working on more attractive 
solutions to the lexical ambiguity problem. | 


(585) Have the boys take the ball! auxiliary diagnostic 
(586) Have the boys taken the ball? 

(587) Which boys were the girls taking to the ball? 

(588) Which boys have the girls take the jars? 

(589) Which boys have the girls taken to the ball? 


(590) I know a man that was nice. that diagnostic 
(591) I know that was nice. 

(592) [ know that that was nice. 

(593) I know that boys are nice. 

(594) I know that boy is nice. 

(595) I know that he is nice. 

(596) That he is nice is a fact. 

(597) That that boy is nice is a fact. 
(598) That that is nice is a fact. 

(599) Who do you believe that was? 
(600) Who do you belicve that that was? 
(601) Did you believe that? 

(602) Did you believe that was him? 
(603) Did you believe that that was him? 
(604) Did you believe he did that? 


We discussed pscudo-attachment bricfly in chapter 4 and pscudo-gaps in chapter 8. (605), (606) and (608) 
illustrate the phenomena; (607) and (609) are near misses (they have only onc attachment/gap). 


SA ty 
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(605) He scems nice to her. pseudo-attachment 
(606) Put the box on the table in the kitchen. 

(607) Put the box on the table. near miss 
(608) Which boys docs he want to see? _ pseudo-gaps 
(609) Which boys docs he.want to take? near miss 


We have been very concerned with stack allocation. (610)-(612) illustrate some borderline center-embedded 
sentences.2/8 YAP does require onc less stack cell for (610) than the other, although. the reason is very 
complex. We don't have enough confidence in the details to trace though the entire explanation. The 
generalization seems to be that a complement is less acceptable in. the’ most deeply embedded clause 
[Cowper76 pp. 71]. YAP finds deeply embedded complements more difficult because it is hard to saintingunsh 
them from relative clauses without storing the entire sentence on the stack. 


(610) The possibility that the man who | hired is incompetent worrics me. 
(611) #'The man who the possibility that students arc dangerous frightens is nice. 
(612) #The man who the possibility that { am dangerous frightens is nice. 


YAP can also parse the following right branching sentences. (616) is somewhat problematic because the two 
that’s arc disambiguated in the wrong order. Hence lop- of fis attached to numor. These diagnostics are not 
well understood. 


(613) It might secm likely that it would seem likely that he is nice. 

(614) Did you hear a rumor that there was a possibility that he might say that I am nice? 
(615) Did you hear a rumor that there was a possibility that he might tell me? 

(616) Did you hear a rumor that there was a possibility that he might tell me of? 

(617) Did you hear a rumor that there was a possibility that he might tclt me of it? 

(618) Did you hear a rumor that it would scem likely that he is nice? 

(619) Did you hear a rumor that John wondered who said that I am nice? 


218. The first two arc taken from [Cowper76}. 
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Appendix II - An Example 


This appendix shows the derivation of (620). The final output (the cstructure and the fstructurc) are given as 
(621) and (622) below.2!9 


(620) Was the boy likely to sit? 


(621) CSUBJ: [(NP-) the boy] cstructure 
CHEAD: [(NP) the boy] 
CSPEC: [(DET) the] 
CHEAD: [(N) boy] 
CHEAD: [(VP) was likely to sit] 
CHEAD: {(V) was] 
CXCOMP: [(AP-) likely to sit] 
CHEAD: [(AP) likely to sit] 
CHEAD: [(A) likely] 
CXCOMP: {(VP-) to sit] 
CCOMP: [(COMP) to] 
CHEAD: [(VP) sit] 
CHEAD: [(V) sit} 


(622) FSUBJ: [(NP-) the boy] Sfstructure 
FSPEC: [(DET) the] 
FXCOMP: [(AP-) likely to sit} 
FSUBS: [(NP-) the boy] 
FSPEC: [(DET) the] 
FXCOMP: {(VP-) to sit] 
FSUBJ: [(NP-) the boy] 
FSPEC: [(DET) the] 


sentence: was the boy likely to sit? initial state 


219. This source was produced by a slightly older version of YAP. Nevertheless, it should still be highly informative. 
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input pointer: LIKELY TO SIT? 
((S)} 


[(V) was] 
{(DET) the] 
{(N) boy] 


YAP will undo the inversion, but first it has to parse np- the boy] to trigger the marked inversion rule. This is 
accomplished by the rule: APPLY-DEFAULT-ATTENTION-SHIFT. No rule of higher priority can apply 
because up] is looking for a subject, not a verb. 


input pointer: TO SIT ? 
((S)} 

(CV) was] 
==WALL== 
[(DET) the] 

[(N) boy] 

((A) likely] 


The determiner in down! triggers a marked rulc to predict a noun phrase: CREATE-NP-1. 


input pointer: TO SIT? 
((S)] 

I(V) was] 
==WALL== 

{(NP)] 

(DET) the} 

{(N) boy] 

[(A) likely] 


The NP is attention shifted to allow [4,, the] and [, boy] to attach. The next three snap-shots show the 
attention shift and two attachments. 
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input pointer: TO SIT ? 

((S)) 

((V) was] 

{((NP)} 

= =WALL== 

(DET) the] 

{(N) boy} 

[(A) likely] 

About to run: APPLY-DEFAUL'T-PS-ATTACHMENT 


input pointer: SIT ? 
((S)] 

[(V) was] 

(NP) the] 


[(N) boy] 

[(A) likely] 

[(COMP) to] 

About to run: APPLY-DEFAULT-PS-ATTACHMENT 


input pointer: ? 
K(S)] 

[(V) was] 

[(NP) the boy] 
==WALL== 
[(A) likely] 
[(COMP) to] 
[(V) sit] 


Now [np the boy] has all of its children, but it doesn’t have a mother yet. It will be returned to the lower 
buffer, so it can find its mother. (Slightly contrary to the discussion in chapter 3, ps-close docs an attention 


return if up! isn’t ready to close. In this case, up] can’t close because it doesn’t have a mother. 
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About to run: APPLY-DEFAULT-PS-CLOSURE 


((S)] 

ICV) was] 

= =WALL== 
[(NP) the boy] 
[(A) likely] 
{(COMP) to] 


‘The NP in downl triggers a marked rule (CREATE-NP--1)?% to predict an np-, which is immediately - 
attention shifted, leaving the machine in the following state. ‘Then ps-attach and ps-closc apply producing the 
next two snap-shots. , , 


((S)] 

[(V) was] 

[(NP-)] 

==WALL== 

[(NP) the boy] 

[(A) likely] 

{(COMP) to] 

About to run: APPI.Y-DEFAULT-PS-ATTACHMENT 


KS)] 

[(V) was] 

[(NP-) the boy] 

==WALL== 

[(A) likely] 

[(COMP) to] 

[(V) sit] 

About to run: APPLY-DEFAULT-PS-Cl OSURE 


220. The rule CREATE-NP--/ predicts an np- whereas the rule CREATE-NP-1 predicts an np. 
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There is nothing Ieft to do but attention-return, hoping to trigger some other rule. In this casc, it will enable 


auxiliary inversion. (It should have predicted an ap- first. This indicates a slight problem.) 


((S)] 

[(V) was] 

==WALL== 

[(NP-) the boy] 

[(A) likely] 

{(COMP) to] 

About to run: APPLY-DEFAULT-PS-CLOSURE. 


[(S)] 


[(V) was] 

[(NP-) the boy] 

[(A) likely] 

About to run: AUX-INVERSION 


Now, ps-attach can apply. 


((S)] 


[(NP-) the boy] 

{(V) likely] 

[(A) to} 

About to run: APPLY-DEFAUL-T-PS-ATTACHMENT 


Notice that [np- the boy] was automatically closed, removed from the buffer, after it was attached. In this 
older version, the closure procedure was very much like Kimball's scheme. The current scheme would not 


close this carly; it would leave the np- in up! and then ps-close would apply. 
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[(S) the boy] 

==WALL== 

{(V) was] 

{(A) likely] 

[(COMP) to] 

About to run: PRED-DEFAULT 


This rule selects the appropriate predicate fur down] from the lexicon. 


[(S) the boy] 

==WALL== 

[(V) was] 

[(A) likely] 

[(COMP) to] 

About to run: ATTACH-FSUBJ 


There is a slight problem checking functional constraints with elements to the icft of the head (such as 
subject). Conscquently, they are checked by a marked rule (ATTACH-FSUB)) which fires when up1 has a 
predicate and a subject. (We are currently exploring more clegant alternatives.) 


[(S) the boy] 

==WALL== 

{(V) was] 

[(A) likely] 

[(COMP) to] 

About to run: CREATE-VP-1 


There is a marked rule to build verb phrases bottom-up. (It is probably unnecessary.) With a more 
symmetric default predict rule, it should be possible to climinate most of the marked prediction rulcs 
(CREATE-...). 
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[(S) the boy] 

[(VP)] 

==WALL== 

[(V) was] 

(A) likely] 

[((COMP) to} 

About to run: APPLY-DEFAULT-PS-ATTACHMENT 


YAP finishes the parse using the same techniques, 


[(S) the boy was] 

[(VP) was] 

==WALL== 

[(A) likely] 

[(COMP) to} 

[(V) sit] 

About to run: CREATE-XCOMP-1 


[(S) the boy was] 

((AP-)] 

= =WALL== 

[(A) likely] 

[((COMP) to} 

[(V) sit] 

About to run: PRED-DEFAULT 


Appendix I 
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{(S) the boy was] 

[(AP-)] 

==WALL== 

[(A) likely] 

{(COMP) to] 

[(V) sit] 

About to run: AVPFACH-FSUBJ 


[(S) the boy was] 

[(AP-)] 

==WALL== 

[(A) likely] 

[(COMP) to] 

[(V) sit] 

About to run: CREATE-AP-1 


Notice that the ap- will close when the ap is attached in the next snapshot. 


[(S) the boy was] 
((AP)} 

= =WALL= 
[(A) likely] 
[((COMP) to] 
[(V) sit] 
About to run: APPLY-DEFAULT-PS-ATTACHMENT 
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[(S) the boy was likely] 

[(AP) likely] 

= =WALL== 

[(COMP) to} 

[(V) sit] 

[(PUNCT) 9] 

About to run: CREATE-INF-VCOMP 


[(S) the boy was likely] 

[(VP-)] 

==WALL== 

[((COMP) to] 

[(V) sit] 

{(PUNCT) 9} 

About to run: APPLY-DEFAULT-PS-ATTACHMENT 


((S) the boy was likely to] 

{(VP-) to] 

==WALL== 

[(V) sit] 

[(PUNCT) 9] 

About to run: PRED-DEFAULT 


[(S) the boy was likely to] 

[(VP-) to] 

<= WALLS = 

{(V) sit] 

[(PUNCY) 9] 

About to run: ATTACH-FSUBJ 
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[(S) the boy was likely to] 
[(VP-) to] 

==WALL== 

[(V) sit] 

[(PUNCT) 9] 

About to run: CREATE-VP-1 


[(S) the boy was likely to] 

[(VP)] 

==WALL== 

[(V) sit] 

[(PUNCT) 9] 

About to run: APPLY-DEFAULT-PS-ATTACHMENT 


{(S) the boy was likely to sit] 
==WALL== 
[(PUNCT) ?] 
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